[R] write merged data frame to a file
Andrea Franceschini
atariw at gmail.com
Mon Jul 18 16:00:29 CEST 2011
Dear all, Sarah,
ok, I start to give some details.
I use version 13 of R in OSX (downloaded and installed less than 1 year ago).
I pasted below the results of the str() command.
The data frame looks perfect for me.
I generated the dataframe using the following commands:
rog=read.table("/Users/andreaf/biodata/roger/data/roger_uuk_offt_seqs.tsv",
header=TRUE, sep="\t")
rog2=add_median_column(rog,"Gene_ID","Score")
rog3=add_count_column(rog2,"Gene_ID", "siRNAcount")
rog4= add_mean_column(rog3,"Gene_ID","CellNumber", "MeanCellNumber")
gn=read.table("/Users/andreaf/biodata/data/gene_names_desc.tsv",
header=TRUE, sep="\t")
rog5=merge(rog4,gn, by.x="Gene_ID", by.y="Gene_ID", all.x=TRUE, all.y=FALSE)
write.table(rog5, file =
"/Users/andreaf/biodata/roger/data/roger_uuk_july2011.tsv",
quote=FALSE, sep="\t", row.names = FALSE)
str(rog5)
'data.frame': 61869 obs. of 14 variables:
$ Gene_ID : int 1 2 2 2 2 9 9 9 9 10 ...
$ Score : num 1.63 0.892 0.473 1.274 1.585 ...
$ CellNumber : int 1085 4031 882 1705 3876 3932 3309 4461 2906 2705 ...
$ siRNAid : Factor w/ 61869 levels
"SI00000007","SI00000035",..: 57157 52189 52188 1 58783 36041 36040
36038 36039 36044 ...
$ offt : int 553 107 712 516 93 1245 1240 1080 673 711 ...
$ RSArank : int 18334 14751 14751 14751 14751 7209 7209 7209
7209 3723 ...
$ sequence : Factor w/ 61869 levels "AAACAACACAACCAUAUCGAG",..:
20497 30102 9841 11752 20305 5537 58794 16241 19223 44336 ...
$ seed6 : Factor w/ 3129 levels "AAACAA","AAACAC",..: 976
2267 241 446 947 37 3032 702 878 2711 ...
$ seed7 : Factor w/ 8947 levels "AAACAAA","AAACAAC",..: 3071
5933 819 1369 2998 134 8641 2196 2768 7486 ...
$ Median : num 1.63 1.08 1.08 1.08 1.08 ...
$ siRNAcount : num 1 4 4 4 4 4 4 4 4 4 ...
$ MeanCellNumber: num 1085 2624 2624 2624 2624 ...
$ GeneName : Factor w/ 25068 levels "1/2-SBSRNA4",..: 3 5 5 5 5
17324 17324 17324 17324 17326 ...
$ GeneDesc : Factor w/ 23388 levels "1-acylglycerol-3-phosphate
O-acyltransferase 1 (lysophosphatidic acid acyltransferase,
alpha)",..: 622 627 627 627 627 14313 14313 14313 14313 14315 ...
The problem arise when I try to write that data frame on the file
(i.e. the last command above... write.table ).
If I open the text file that it generates I found the following lines
(pasted below).
The first lines are OK (i.e. 14 columns, like the dataframe), while at
a certain point I get lines with only 3 columns !!!
The bad lines that contain only 3 columns have the name and the
description of the gene (i.e. the content of the file that I merged
with).
Besides, these strange lines also get repeated (see the bottom).
Why do I get also those strange lines in the file ? and apparently
NOT in the dataframe, that instead looks perfect ?
990.33333333333 HNRNPF heterogeneous nuclear ribonucleoprotein F
3185 0.844752 601 SI02651824 816 12598 UUGAACACCUCAAUGUACCGG UGAACA UGAACAC 0.844752 1990.33333333333 HNRNPF heterogeneous
nuclear ribonucleoprotein F
3185 1.3576397 1508 SI02651838 586 12598 UUUACUCAUUAUCACAUGCUA UUACUC UUACUCA 0.844752 1990.33333333333 HNRNPF heterogeneous
nuclear ribonucleoprotein F
3187 0.7738942 2001 SI00439831 1368 12620 UUUAACAUAAUUCAACUGCUU UUAACA UUAACAU 0.7738942 1522.33333333333 HNRNPH1 heterogeneous
nuclear ribonucleoprotein H1 (H)
3187 1.179532 790 SI00439845 913 12620 UUAAGUUUAACAGUUAUAGUU UAAGUU UAAGUUU 0.7738942 1522.33333333333 HNRNPH1 heterogeneous
nuclear ribonucleoprotein H1 (H)
3187 0.6908423 1776 SI02654799 1244 12620 UUAAAGAUUUCAAUAUACCUG UAAAGA UAAAGAU 0.7738942 1522.33333333333 HNRNPH1 heterogeneous
nuclear ribonucleoprotein H1 (H)
3188 0.6514164 915 SI00439866 1444 3319 UUAACAAACAUGCCAAAUGUU UAACAA UAACAAA 0.69809605 1541 HNRNPH2 heterogeneous
nuclear ribonucleoprotein H2 (H)
3189 HNRNPH3 heterogeneous nuclear ribonucleoprotein H3 (2H9)
3190 HNRNPK heterogeneous nuclear ribonucleoprotein K
3191 HNRNPL heterogeneous nuclear ribonucleoprotein L
3192 HNRNPU heterogeneous nuclear ribonucleoprotein U (scaffold
attachment factor A)
3193 HOAC hypoacusis 2 (autosomal recessive)
3195 TLX1 T-cell leukemia homeobox 1
3196 TLX2 T-cell leukemia homeobox 2
3197 HOXA@ homeobox A cluster
3198 HOXA1 homeobox A1
3199 HOXA2 homeobox A2
3200 HOXA3 homeobox A3
3201 HOXA4 homeobox A4
3202 HOXA5 homeobox A5
3203 HOXA6 homeobox A6
3204 HOXA7 homeobox A7
3205 HOXA9 homeobox A9
3206 HOXA10 homeobox A10
3207 HOXA11 homeobox A11
3208 HPCA hippocalcin
3209 HOXA13 homeobox A13
3210 HOXB@ homeobox B cluster
3211 HOXB1 homeobox B1
3212 HOXB2 homeobox B2
3213 HOXB3 homeobox B3
3214 HOXB4 homeobox B4
3215 HOXB5 homeobox B5
3216 HOXB6 homeobox B6
3217 HOXB7 homeobox B7
3218 HOXB8 homeobox B8
3219 HOXB9 homeobox B9
3220 HOXC@ homeobox C cluster
3221 HOXC4 homeobox C4
3222 HOXC5 homeobox C5
3223 HOXC6 homeobox C6
3224 HOXC8 homeobox C8
3225 HOXC9 homeobox C9
3226 HOXC10 homeobox C10
3227 HOXC11 homeobox C11
3228 HOXC12 homeobox C12
3229 HOXC13 homeobox C13
3230 HOXD@ homeobox D cluster
3231 HOXD1 homeobox D1
3232 HOXD3 homeobox D3
3233 HOXD4 homeobox D4
3234 HOXD8 homeobox D8
3235 HOXD9 homeobox D9
3236 HOXD10 homeobox D10
3237 HOXD11 homeobox D11
3238 HOXD12 homeobox D12
3239 HOXD13 homeobox D13
3240 HP haptoglobin
3241 HPCAL1 hippocalcin-like 1
3242 HPD 4-hydroxyphenylpyruvate dioxygenase
3244 HPE1 holoprosencephaly 1, alobar
3247 HPFH2 hereditary persistence of fetal hemoglobin, heterocellular,
Indian type
3248 HPGD hydroxyprostaglandin dehydrogenase 15-(NAD)
3249 HPN hepsin
3250 HPR haptoglobin-related protein
3251 HPRT1 hypoxanthine phosphoribosyltransferase 1
3254 HPRTP2 hypoxanthine phosphoribosyltransferase pseudogene 2
3255 HPRTP3 hypoxanthine phosphoribosyltransferase pseudogene 3
3257 HPS1 Hermansky-Pudlak syndrome 1
3258 HPT hypoparathyroidism
3259 HPV6AI1 human papillomavirus (type 6a) integration site 1
3260 HPV18I1 human papilloma virus (type 18) integration site 1
3261 HPV18I2 human papillomavirus (type 18) integration site 2
3262 HPVC1 human papillomavirus (type 18) E5 central sequence-like 1
3263 HPX hemopexin
3265 HRAS v-Ha-ras Harvey rat sarcoma viral oncogene homolog
3266 ERAS ES cell expressed Ras
3267 AGFG1 ArfGAP with FG repeats 1
3268 AGFG2 ArfGAP with FG repeats 2
3269 HRH1 histamine receptor H1
3270 HRC histidine rich calcium binding protein
3272 HRES1 HTLV-1 related endogenous sequence
3273 HRG histidine-rich glycoprotein
3274 HRH2 histamine receptor H2
3275 PRMT2 protein arginine methyltransferase 2
3276 PRMT1 protein arginine methyltransferase 1
3278 HRPT1 hyperparathyroidism 1
3280 HES1 hairy and enhancer of split 1, (Drosophila)
3281 HSBP1 heat shock factor binding protein 1
3283 HSD3B1 hydroxy-delta-5-steroid dehydrogenase, 3 beta- and steroid
delta-isomerase 1
3284 HSD3B2 hydroxy-delta-5-steroid dehydrogenase, 3 beta- and steroid
delta-isomerase 2
3290 HSD11B1 hydroxysteroid (11-beta) dehydrogenase 1
3291 HSD11B2 hydroxysteroid (11-beta) dehydrogenase 2
3292 HSD17B1 hydroxysteroid (17-beta) dehydrogenase 1
3293 HSD17B3 hydroxysteroid (17-beta) dehydrogenase 3
3294 HSD17B2 hydroxysteroid (17-beta) dehydrogenase 2
3295 HSD17B4 hydroxysteroid (17-beta) dehydrogenase 4
3297 HSF1 heat shock transcription factor 1
3298 HSF2 heat shock transcription factor 2
3299 HSF4 heat shock transcription factor 4
3300 DNAJB2 DnaJ (Hsp40) homolog, subfamily B, member 2
3301 DNAJA1 DnaJ (Hsp40) homolog, subfamily A, member 1
3303 HSPA1A heat shock 70kDa protein 1A
3304 HSPA1B heat shock 70kDa protein 1B
3305 HSPA1L heat shock 70kDa protein 1-like
3306 HSPA2 heat shock 70kDa protein 2
3308 HSPA4 heat shock 70kDa protein 4
3309 HSPA5 heat shock 70kDa protein 5 (glucose-regulated protein, 78kDa)
3310 HSPA6 heat shock 70kDa protein 6 (HSP70B)
3188 0.7339296 1242 SI00439852 425 3319 UGUAGCUCUAACGAUACCGGG GUAGCU GUAGCUC 0.69809605 1541 HNRNPH2 heterogeneous
nuclear ribonucleoprotein H2 (H)
3189 HNRNPH3 heterogeneous nuclear ribonucleoprotein H3 (2H9)
3190 HNRNPK heterogeneous nuclear ribonucleoprotein K
3191 HNRNPL heterogeneous nuclear ribonucleoprotein L
3192 HNRNPU heterogeneous nuclear ribonucleoprotein U (scaffold
attachment factor A)
3193 HOAC hypoacusis 2 (autosomal recessive)
3195 TLX1 T-cell leukemia homeobox 1
3196 TLX2 T-cell leukemia homeobox 2
3197 HOXA@ homeobox A cluster
3198 HOXA1 homeobox A1
3199 HOXA2 homeobox A2
3200 HOXA3 homeobox A3
3201 HOXA4 homeobox A4
Thankyou very much,
Best Regards,
Andrea
On Mon, Jul 18, 2011 at 2:45 PM, Sarah Goslee <sarah.goslee at gmail.com> wrote:
> Hi Andrea,
>
> On Mon, Jul 18, 2011 at 6:07 AM, Andrea Franceschini <atariw at gmail.com> wrote:
>> Dear all,
>>
>> I merged 2 data frames using the merged command and the resulting data frame
>> looks perfect into R.
>>
>> However, I have serious problems when I try to write this new data frame
>> into a file using the write.table command.
>>
>> Basically I get parts of the second file that I merged into the file.
>>
>> What could it be the problem ?
>
> I have no idea. What did you do? What does your new data frame look like?
> What commands did you issue? What result did you get? What result did
> you expect? Can you provide a small reproducible example, or at least
> a great deal more information? str() is a good start.
>
> Sarah
>>
>> Thankyou very much,
>> Best Regards,
>> Andrea
>>
>> --
>
> --
> Sarah Goslee
> http://www.functionaldiversity.org
>
More information about the R-help
mailing list