[R] read.table truncated data?
Petr PIKAL
petr.pikal at precheza.cz
Fri Aug 26 10:22:13 CEST 2011
Hi
>
> Thanks, Jim. quote='' works. And then I found a single quote in each of
> these lines:
> 3262
> 10403
> 17544
> 24685
> 31826
> 38967
>
> None of them near the position the table got truncated. Why is it?
>
> And read.table is a great function. Is it possible for it to give a
warning
> message when the data gets truncated? In my case I almost looked over
the
> truncation...
When I read in some big data I usually do
str(data)
which tells me if there is some problem with data types (conversion of
numeric to factor due to any problematic item)
and/or
dim(data)
to see that size is as expected.
Regards
Petr
>
> On Thu, Aug 25, 2011 at 11:57 AM, jim holtman <jholtman at gmail.com>
wrote:
>
> > But did you try the following:
> >
> > x <- read.table(...., comment.char = '', quote = '')
> >
> > Most cases is that there is a missing quote somewhere in your data.
> > use a text editor and search for single and double quotes.
> >
> > On Thu, Aug 25, 2011 at 11:49 AM, zhenjiang xu
<zhenjiang.xu at gmail.com>
> > wrote:
> > > Thanks for your replies. I looked at those lines and didn't spot
anything
> > > unusual.
> > >
> > >> tail(a)
> > > test_id gene_id gene locus sample_1 sample_2
status
> > > 21418 tY(GUA)J1 - SUP7 chr10:354243-354332 air1rrp6 air2rrp6
OK
> > > 21419 tY(GUA)J2 - SUP4 chr10:542955-543044 air1rrp6 air2rrp6
OK
> > > 21420 tY(GUA)M1 - SUP5 chr13:168794-168883 air1rrp6 air2rrp6
OK
> > > 21421 tY(GUA)M2 - SUP8 chr13:837927-838016 air1rrp6 air2rrp6
OK
> > > 21422 tY(GUA)O - SUP3 chr15:288191-288280 air1rrp6 air2rrp6
OK
> > > 21423 tY(GUA)Q - - chrmt:70823-70907 air1rrp6 air2rrp6
> > OK
> > > value_1 value_2 ln.fold_change. test_stat p_value q_value
> > > significant
> > > 21418 0.00000 0.0000 0.000000 0.00000 1.000000 1.011650
> > > no
> > > 21419 0.00000 0.0000 0.000000 0.00000 1.000000 1.011480
> > > no
> > > 21420 0.00000 0.0000 0.000000 0.00000 1.000000 1.011500
> > > no
> > > 21421 0.00000 0.0000 0.000000 0.00000 1.000000 1.011520
> > > no
> > > 21422 0.00000 0.0000 0.000000 0.00000 1.000000 1.011550
> > > no
> > > 21423 6.68356 10.7397 0.474301 -1.08614 0.277417 0.455917
> > > no
> > >
> > >
> > > tY(GUA)J1 - SUP7 chr10:354243-354332 rrp6
air1rrp6
> > > OK 0 0 0 0 1 1.00404 no
> > > tY(GUA)J2 - SUP4 chr10:542955-543044 rrp6
air1rrp6
> > > OK 0 0 0 0 1 1.00497 no
> > > tY(GUA)M1 - SUP5 chr13:168794-168883 rrp6
air1rrp6
> > > OK 0 0 0 0 1 1.00492 no
> > > tY(GUA)M2 - SUP8 chr13:837927-838016 rrp6
air1rrp6
> > > OK 0 0 0 0 1 1.00488 no
> > > tY(GUA)O - SUP3 chr15:288191-288280 rrp6
air1rrp6
> > > OK 0 0 0 0 1 1.00485 no
> > > tY(GUA)Q - - chrmt:70823-70907 rrp6
air1rrp6
> > > OK 4.49644 6.68356 0.396365 -0.766052 0.443645
> > > 0.634724 no
> > > 15S_rRNA - 15S_RRNA chrmt:6545-8194 WT air2rrp6
> > > OK 2288.88 711.697 -1.16817 2.78772 0.00530801
> > > 0.0167772 yes
> > > 21S_rRNA - 21S_RRNA chrmt:58008-62447 WT
> > > air2rrp6 OK 4134.59 1927.04 -0.7634 1.58991 0.111855
> > > 0.22339 no
> > > ETS1-1 - ETS1-1 chr12:457732-458432 WT air2rrp6
> > OK
> > > 3258.97 1114.76 -1.07277 2.91211 0.00359 0.0121587
> > yes
> > > ETS1-2 - ETS1-2 chr12:466869-467569 WT air2rrp6
> > OK
> > > 3258.97 1114.76 -1.07277 2.91211 0.00359 0.0121597
> > yes
> > >
> > >
> > > On Wed, Aug 24, 2011 at 2:34 PM, Sarah Goslee
<sarah.goslee at gmail.com
> > >wrote:
> > >
> > >> Hi,
> > >>
> > >> On Wed, Aug 24, 2011 at 2:18 PM, zhenjiang xu
<zhenjiang.xu at gmail.com>
> > >> wrote:
> > >> > Hi R users,
> > >> >
> > >> > I was using read.table to read a file. The data.fame looked
alright,
> > but
> > >> I
> > >> > found not all rows are read by the read.table. What's wrong with
it?
> > It
> > >> > didn't give me any warning or error messages. Why the data are
> > truncated?
> > >> > Thanks.
> > >> >
> > >> > $ wc -l all/isoform_exp.diff
> > >> > 42847 all/isoform_exp.diff
> > >> >
> > >> >> a=read.table('all/isoform_exp.diff', header=T, sep='\t')
> > >> >> nrow(a)
> > >> > [1] 21423
> > >>
> > >> This is a common problem. You need to take a look at the last row
that
> > >> was imported, and the rows around 21423 in the original file.
> > >>
> > >> Common causes include stray single or double quotation marks, and
> > >> other special characters in your file like the default comment.char
#
> > >>
> > >> Sarah
> > >> --
> > >> Sarah Goslee
> > >> http://www.functionaldiversity.org
> > >>
> > >
> > >
> > >
> > > --
> > > Best,
> > > Zhenjiang
> > >
> > > [[alternative HTML version deleted]]
> > >
> > > ______________________________________________
> > > R-help at r-project.org mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> > >
> >
> >
> >
> > --
> > Jim Holtman
> > Data Munger Guru
> >
> > What is the problem that you are trying to solve?
> >
>
>
>
> --
> Best,
> Zhenjiang
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list