[R] write merged data frame to a file

Andrea Franceschini atariw at gmail.com
Mon Jul 18 16:00:29 CEST 2011


Dear all, Sarah,

ok, I start to give some details.
I use version 13 of R in OSX (downloaded and installed less than 1 year ago).
I pasted below the results of the str() command.
The data frame looks perfect for me.
I generated the dataframe using the following commands:

rog=read.table("/Users/andreaf/biodata/roger/data/roger_uuk_offt_seqs.tsv",
header=TRUE, sep="\t")
rog2=add_median_column(rog,"Gene_ID","Score")
rog3=add_count_column(rog2,"Gene_ID", "siRNAcount")
rog4= add_mean_column(rog3,"Gene_ID","CellNumber", "MeanCellNumber")
gn=read.table("/Users/andreaf/biodata/data/gene_names_desc.tsv",
header=TRUE, sep="\t")
rog5=merge(rog4,gn, by.x="Gene_ID", by.y="Gene_ID", all.x=TRUE, all.y=FALSE)
write.table(rog5, file =
"/Users/andreaf/biodata/roger/data/roger_uuk_july2011.tsv",
quote=FALSE, sep="\t",  row.names = FALSE)


str(rog5)
'data.frame':	61869 obs. of  14 variables:
 $ Gene_ID       : int  1 2 2 2 2 9 9 9 9 10 ...
 $ Score         : num  1.63 0.892 0.473 1.274 1.585 ...
 $ CellNumber    : int  1085 4031 882 1705 3876 3932 3309 4461 2906 2705 ...
 $ siRNAid       : Factor w/ 61869 levels
"SI00000007","SI00000035",..: 57157 52189 52188 1 58783 36041 36040
36038 36039 36044 ...
 $ offt          : int  553 107 712 516 93 1245 1240 1080 673 711 ...
 $ RSArank       : int  18334 14751 14751 14751 14751 7209 7209 7209
7209 3723 ...
 $ sequence      : Factor w/ 61869 levels "AAACAACACAACCAUAUCGAG",..:
20497 30102 9841 11752 20305 5537 58794 16241 19223 44336 ...
 $ seed6         : Factor w/ 3129 levels "AAACAA","AAACAC",..: 976
2267 241 446 947 37 3032 702 878 2711 ...
 $ seed7         : Factor w/ 8947 levels "AAACAAA","AAACAAC",..: 3071
5933 819 1369 2998 134 8641 2196 2768 7486 ...
 $ Median        : num  1.63 1.08 1.08 1.08 1.08 ...
 $ siRNAcount    : num  1 4 4 4 4 4 4 4 4 4 ...
 $ MeanCellNumber: num  1085 2624 2624 2624 2624 ...
 $ GeneName      : Factor w/ 25068 levels "1/2-SBSRNA4",..: 3 5 5 5 5
17324 17324 17324 17324 17326 ...
 $ GeneDesc      : Factor w/ 23388 levels "1-acylglycerol-3-phosphate
O-acyltransferase 1 (lysophosphatidic acid acyltransferase,
alpha)",..: 622 627 627 627 627 14313 14313 14313 14313 14315 ...






The problem arise when I try to write that data frame on the file
(i.e. the last command above... write.table  ).
If I open the text file that it generates I found the following lines
(pasted below).
The first lines are OK (i.e. 14 columns, like the dataframe), while at
a certain point I get lines with only 3 columns !!!
The bad lines that contain only 3 columns have the name and the
description of the gene (i.e. the content of the file that I merged
with).
Besides, these strange lines also get repeated (see the bottom).
Why do I get also those strange lines in the file ?  and apparently
NOT in the dataframe, that instead looks perfect ?




990.33333333333	HNRNPF	heterogeneous nuclear ribonucleoprotein F
3185	0.844752	601	SI02651824	816	12598	UUGAACACCUCAAUGUACCGG	UGAACA	UGAACAC	0.844752	1990.33333333333	HNRNPF	heterogeneous
nuclear ribonucleoprotein F
3185	1.3576397	1508	SI02651838	586	12598	UUUACUCAUUAUCACAUGCUA	UUACUC	UUACUCA	0.844752	1990.33333333333	HNRNPF	heterogeneous
nuclear ribonucleoprotein F
3187	0.7738942	2001	SI00439831	1368	12620	UUUAACAUAAUUCAACUGCUU	UUAACA	UUAACAU	0.7738942	1522.33333333333	HNRNPH1	heterogeneous
nuclear ribonucleoprotein H1 (H)
3187	1.179532	790	SI00439845	913	12620	UUAAGUUUAACAGUUAUAGUU	UAAGUU	UAAGUUU	0.7738942	1522.33333333333	HNRNPH1	heterogeneous
nuclear ribonucleoprotein H1 (H)
3187	0.6908423	1776	SI02654799	1244	12620	UUAAAGAUUUCAAUAUACCUG	UAAAGA	UAAAGAU	0.7738942	1522.33333333333	HNRNPH1	heterogeneous
nuclear ribonucleoprotein H1 (H)
3188	0.6514164	915	SI00439866	1444	3319	UUAACAAACAUGCCAAAUGUU	UAACAA	UAACAAA	0.69809605	1541	HNRNPH2	heterogeneous
nuclear ribonucleoprotein H2 (H)
3189	HNRNPH3	heterogeneous nuclear ribonucleoprotein H3 (2H9)
3190	HNRNPK	heterogeneous nuclear ribonucleoprotein K
3191	HNRNPL	heterogeneous nuclear ribonucleoprotein L
3192	HNRNPU	heterogeneous nuclear ribonucleoprotein U (scaffold
attachment factor A)
3193	HOAC	hypoacusis 2 (autosomal recessive)
3195	TLX1	T-cell leukemia homeobox 1
3196	TLX2	T-cell leukemia homeobox 2
3197	HOXA@	homeobox A cluster
3198	HOXA1	homeobox A1
3199	HOXA2	homeobox A2
3200	HOXA3	homeobox A3
3201	HOXA4	homeobox A4
3202	HOXA5	homeobox A5
3203	HOXA6	homeobox A6
3204	HOXA7	homeobox A7
3205	HOXA9	homeobox A9
3206	HOXA10	homeobox A10
3207	HOXA11	homeobox A11
3208	HPCA	hippocalcin
3209	HOXA13	homeobox A13
3210	HOXB@	homeobox B cluster
3211	HOXB1	homeobox B1
3212	HOXB2	homeobox B2
3213	HOXB3	homeobox B3
3214	HOXB4	homeobox B4
3215	HOXB5	homeobox B5
3216	HOXB6	homeobox B6
3217	HOXB7	homeobox B7
3218	HOXB8	homeobox B8
3219	HOXB9	homeobox B9
3220	HOXC@	homeobox C cluster
3221	HOXC4	homeobox C4
3222	HOXC5	homeobox C5
3223	HOXC6	homeobox C6
3224	HOXC8	homeobox C8
3225	HOXC9	homeobox C9
3226	HOXC10	homeobox C10
3227	HOXC11	homeobox C11
3228	HOXC12	homeobox C12
3229	HOXC13	homeobox C13
3230	HOXD@	homeobox D cluster
3231	HOXD1	homeobox D1
3232	HOXD3	homeobox D3
3233	HOXD4	homeobox D4
3234	HOXD8	homeobox D8
3235	HOXD9	homeobox D9
3236	HOXD10	homeobox D10
3237	HOXD11	homeobox D11
3238	HOXD12	homeobox D12
3239	HOXD13	homeobox D13
3240	HP	haptoglobin
3241	HPCAL1	hippocalcin-like 1
3242	HPD	4-hydroxyphenylpyruvate dioxygenase
3244	HPE1	holoprosencephaly 1, alobar
3247	HPFH2	hereditary persistence of fetal hemoglobin, heterocellular,
Indian type
3248	HPGD	hydroxyprostaglandin dehydrogenase 15-(NAD)
3249	HPN	hepsin
3250	HPR	haptoglobin-related protein
3251	HPRT1	hypoxanthine phosphoribosyltransferase 1
3254	HPRTP2	hypoxanthine phosphoribosyltransferase pseudogene 2
3255	HPRTP3	hypoxanthine phosphoribosyltransferase pseudogene 3
3257	HPS1	Hermansky-Pudlak syndrome 1
3258	HPT	hypoparathyroidism
3259	HPV6AI1	human papillomavirus (type 6a) integration site 1
3260	HPV18I1	human papilloma virus (type 18) integration site 1
3261	HPV18I2	human papillomavirus (type 18) integration site 2
3262	HPVC1	human papillomavirus (type 18) E5 central sequence-like 1
3263	HPX	hemopexin
3265	HRAS	v-Ha-ras Harvey rat sarcoma viral oncogene homolog
3266	ERAS	ES cell expressed Ras
3267	AGFG1	ArfGAP with FG repeats 1
3268	AGFG2	ArfGAP with FG repeats 2
3269	HRH1	histamine receptor H1
3270	HRC	histidine rich calcium binding protein
3272	HRES1	HTLV-1 related endogenous sequence
3273	HRG	histidine-rich glycoprotein
3274	HRH2	histamine receptor H2
3275	PRMT2	protein arginine methyltransferase 2
3276	PRMT1	protein arginine methyltransferase 1
3278	HRPT1	hyperparathyroidism 1
3280	HES1	hairy and enhancer of split 1, (Drosophila)
3281	HSBP1	heat shock factor binding protein 1
3283	HSD3B1	hydroxy-delta-5-steroid dehydrogenase, 3 beta- and steroid
delta-isomerase 1
3284	HSD3B2	hydroxy-delta-5-steroid dehydrogenase, 3 beta- and steroid
delta-isomerase 2
3290	HSD11B1	hydroxysteroid (11-beta) dehydrogenase 1
3291	HSD11B2	hydroxysteroid (11-beta) dehydrogenase 2
3292	HSD17B1	hydroxysteroid (17-beta) dehydrogenase 1
3293	HSD17B3	hydroxysteroid (17-beta) dehydrogenase 3
3294	HSD17B2	hydroxysteroid (17-beta) dehydrogenase 2
3295	HSD17B4	hydroxysteroid (17-beta) dehydrogenase 4
3297	HSF1	heat shock transcription factor 1
3298	HSF2	heat shock transcription factor 2
3299	HSF4	heat shock transcription factor 4
3300	DNAJB2	DnaJ (Hsp40) homolog, subfamily B, member 2
3301	DNAJA1	DnaJ (Hsp40) homolog, subfamily A, member 1
3303	HSPA1A	heat shock 70kDa protein 1A
3304	HSPA1B	heat shock 70kDa protein 1B
3305	HSPA1L	heat shock 70kDa protein 1-like
3306	HSPA2	heat shock 70kDa protein 2
3308	HSPA4	heat shock 70kDa protein 4
3309	HSPA5	heat shock 70kDa protein 5 (glucose-regulated protein, 78kDa)
3310	HSPA6	heat shock 70kDa protein 6 (HSP70B)
3188	0.7339296	1242	SI00439852	425	3319	UGUAGCUCUAACGAUACCGGG	GUAGCU	GUAGCUC	0.69809605	1541	HNRNPH2	heterogeneous
nuclear ribonucleoprotein H2 (H)
3189	HNRNPH3	heterogeneous nuclear ribonucleoprotein H3 (2H9)
3190	HNRNPK	heterogeneous nuclear ribonucleoprotein K
3191	HNRNPL	heterogeneous nuclear ribonucleoprotein L
3192	HNRNPU	heterogeneous nuclear ribonucleoprotein U (scaffold
attachment factor A)
3193	HOAC	hypoacusis 2 (autosomal recessive)
3195	TLX1	T-cell leukemia homeobox 1
3196	TLX2	T-cell leukemia homeobox 2
3197	HOXA@	homeobox A cluster
3198	HOXA1	homeobox A1
3199	HOXA2	homeobox A2
3200	HOXA3	homeobox A3
3201	HOXA4	homeobox A4


Thankyou very much,
Best Regards,
Andrea






On Mon, Jul 18, 2011 at 2:45 PM, Sarah Goslee <sarah.goslee at gmail.com> wrote:
> Hi Andrea,
>
> On Mon, Jul 18, 2011 at 6:07 AM, Andrea Franceschini <atariw at gmail.com> wrote:
>> Dear all,
>>
>> I merged 2 data frames using the merged command and the resulting data frame
>> looks perfect into R.
>>
>> However, I have serious problems when I try to write this new data frame
>> into a file using the write.table command.
>>
>> Basically I get parts of the second file that I merged into the file.
>>
>> What could it be the problem ?
>
> I have no idea. What did you do? What does your new data frame look like?
> What commands did you issue? What result did you get? What result did
> you expect? Can you provide a small reproducible example, or at least
> a great deal more information? str() is a good start.
>
> Sarah
>>
>> Thankyou very much,
>> Best Regards,
>> Andrea
>>
>> --
>
> --
> Sarah Goslee
> http://www.functionaldiversity.org
>



More information about the R-help mailing list