[R] Separator with " | " for read.table

jim holtman jholtman at gmail.com
Mon Jun 16 03:39:25 CEST 2008


I am not exactly sure what you are after, but if you are just printing
out a single column, then unless you use "drop=FALSE" in referencing
it, it is a vector:

> x <- read.table(textConnection("#GDS_ID GENE_NAME GENE_DESCRIPTION GENE_FUNCTION
+ 1007_s_at | DDR1 | discoidin domain receptor tyrosine kinase 1 |
protein-coding
+ 1053_at | RFC2 | replication factor C (activator 1) 2, 40kDa | protein-coding
+ 117_at | HSPA6 | heat shock 70kDa protein 6 (HSP70B') |
protein-coding"), sep="|", quote='')
> closeAllConnections()
> str(x)
'data.frame':   3 obs. of  4 variables:
 $ V1: Factor w/ 3 levels "1007_s_at ","1053_at ",..: 1 2 3
 $ V2: Factor w/ 3 levels " DDR1 "," HSPA6 ",..: 1 3 2
 $ V3: Factor w/ 3 levels " discoidin domain receptor tyrosine kinase
1 ",..: 1 3 2
 $ V4: Factor w/ 1 level " protein-coding": 1 1 1
> print(x$V3)
[1]  discoidin domain receptor tyrosine kinase 1   replication factor
C (activator 1) 2, 40kDa
[3]  heat shock 70kDa protein 6 (HSP70B')
3 Levels:  discoidin domain receptor tyrosine kinase 1  ...
replication factor C (activator 1) 2, 40kDa
> x$V3
[1]  discoidin domain receptor tyrosine kinase 1   replication factor
C (activator 1) 2, 40kDa
[3]  heat shock 70kDa protein 6 (HSP70B')
3 Levels:  discoidin domain receptor tyrosine kinase 1  ...
replication factor C (activator 1) 2, 40kDa
> x[, "V3", drop=FALSE]  # is this what you were expecting
                                             V3
1  discoidin domain receptor tyrosine kinase 1
2  replication factor C (activator 1) 2, 40kDa
3         heat shock 70kDa protein 6 (HSP70B')
>




On Sun, Jun 15, 2008 at 9:31 PM, Gundala Viswanath <gundalav at gmail.com> wrote:
> Thanks so much Jim,
>
> It works. However how come the "\n" was not removed.
> Meaning when I do:
>
> print (x$V3)
>
> it gives something like this:
> __OUTPUT__
>    [1]  discoidin domain receptor tyrosine kinase 1
>
>    [2]  replication factor C (activator 1) 2, 40kDa
>
>    [3]  heat shock 70kDa protein 6 (HSP70B')
>
> __END__
>
> Note the spacing between the entries. I expect something like:
>
>    [1]  discoidin domain receptor tyrosine kinase 1
>    [2]  replication factor C (activator 1) 2, 40kDa
>    [3]  heat shock 70kDa protein 6 (HSP70B')
>    __END__
>
> Do you have any idea how to fix this?
>
>
>
> - Gundala Viswanath
> Jakarta - Indonesia
>
>
> On Mon, Jun 16, 2008 at 10:19 AM, jim holtman <jholtman at gmail.com> wrote:
>> Does this give you what you want:
>>
>>> x <- read.table(textConnection("#GDS_ID GENE_NAME GENE_DESCRIPTION GENE_FUNCTION
>> + 1007_s_at | DDR1 | discoidin domain receptor tyrosine kinase 1 |
>> protein-coding
>> + 1053_at | RFC2 | replication factor C (activator 1) 2, 40kDa | protein-coding
>> + 117_at | HSPA6 | heat shock 70kDa protein 6 (HSP70B') |
>> protein-coding"), sep="|", quote='')
>>> closeAllConnections()
>>>
>>> x
>>          V1      V2                                            V3
>>         V4
>> 1 1007_s_at    DDR1   discoidin domain receptor tyrosine kinase 1
>> protein-coding
>> 2   1053_at    RFC2   replication factor C (activator 1) 2, 40kDa
>> protein-coding
>> 3    117_at   HSPA6          heat shock 70kDa protein 6 (HSP70B')
>> protein-coding
>>>
>>
>>
>> You had a quote(') in your data; you need to have quote='' in the read.table.
>>
>> On Sun, Jun 15, 2008 at 9:11 PM, Gundala Viswanath <gundalav at gmail.com> wrote:
>>> Hi,
>>>
>>> I have the following data file to be parsed and captured as a data frame:
>>>
>>> __DATA__
>>> #GDS_ID GENE_NAME GENE_DESCRIPTION GENE_FUNCTION
>>> 1007_s_at | DDR1 | discoidin domain receptor tyrosine kinase 1 | protein-coding
>>> 1053_at | RFC2 | replication factor C (activator 1) 2, 40kDa | protein-coding
>>> 117_at | HSPA6 | heat shock 70kDa protein 6 (HSP70B') | protein-coding
>>>
>>> __END__
>>>
>>> In particular it is separated by " | " , namely - space, bar, space.
>>> However I tried this without avail:
>>>
>>> geneinfo <- read.table("mydata.txt", sep=" | ", comment.char="\#")
>>> print(geneinfo)
>>>
>>> I also tried with sep= "|", it gave a wrong parsing. Please advice.
>>>
>>> - Gundala Viswanath
>>> Jakarta - Indonesia
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>
>>
>> --
>> Jim Holtman
>> Cincinnati, OH
>> +1 513 646 9390
>>
>> What is the problem you are trying to solve?
>>
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?



More information about the R-help mailing list