[R] trouble with read.table and colClasses='raw'
Greg Snow
Greg.Snow at imail.org
Thu Feb 11 19:32:19 CET 2010
The other possibility is that you could create the function to convert from character to raw (possibly wrapping as.raw around as.integer) so that read.table knows what to do.
--
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow at imail.org
801.408.8111
> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> project.org] On Behalf Of Greg Snow
> Sent: Thursday, February 11, 2010 11:06 AM
> To: Johan Jackson; Don MacQueen
> Cc: r-help at r-project.org
> Subject: Re: [R] trouble with read.table and colClasses='raw'
>
> The read.table function does not know how to convert the character
> representation that it reads into raw variables. Try using 'integer'
> for the colClasses to read the data in as integers, then convert those
> back to raw (if that is really what you need).
>
> --
> Gregory (Greg) L. Snow Ph.D.
> Statistical Data Center
> Intermountain Healthcare
> greg.snow at imail.org
> 801.408.8111
>
>
> > -----Original Message-----
> > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> > project.org] On Behalf Of Johan Jackson
> > Sent: Thursday, February 11, 2010 10:29 AM
> > To: Don MacQueen
> > Cc: r-help at r-project.org
> > Subject: Re: [R] trouble with read.table and colClasses='raw'
> >
> > Hi Don and all,
> >
> > I guess we're getting somewhere. Thanks. The file (first three
> columns,
> > first five rows) looks like this:
> >
> > X10 X20 X30
> > 00 00 01
> > 00 02 02
> > 00 00 00
> > 00 01 01
> > 00 00 00
> >
> >
> > I guess R is reading 00 as a character? But here's the weird thing:
> > this
> > data (a raw matrix in R) was written out by R itself:
> >
> > write.table(dat,"data",col.names=T,row.names=F,quote=F)
> >
> > *If* I understand correctly, then this seems like very *bad behavior*
> > on R's
> > part: you should be able to write out a matrix and read it right back
> > into R
> > without hassles like this (but everytime I blame R, it turns out to
> be
> > user
> > error, so...),
> >
> > JJ
> >
> >
> >
> > On Thu, Feb 11, 2010 at 9:59 AM, Don MacQueen <macq at llnl.gov> wrote:
> >
> > > The error message says there is no method for converting from
> > 'character'
> > > to 'raw'.
> > > Apparently, R is seeing character data in the file, and is trying
> to
> > > convert it to raw, since you specified raw, and it can't.
> > >
> > > See, for example,
> > >
> > >> as('aa','raw')
> > >>
> > > Error in as("aa", "raw") :
> > >
> > > no method or default for coercing "character" to "raw"
> > >
> > > (same error message)
> > >
> > > So I would ask, what are your data, really? Why are you asking for
> > raw?
> > > Have you checked the help page for raw to make sure it's what you
> > want?
> > >
> > > -Don
> > >
> > > At 5:23 PM +0100 2/11/10, Ivan Calandra wrote:
> > >
> > >> Content-Type: text/plain
> > >> Content-Disposition: inline
> > >> Content-Transfer-Encoding: 8bit
> > >> Content-length: 3983
> > >>
> > >>
> > >> Well, it's too complicated for me! Here are what I would do
> (limited
> > >> since I'm still a newbie)
> > >>
> > >> 1) the syntax seems correct, it should work. The problem is
> > somewhere
> > >> else, coming from your own file. Did you try skipping the
> colClasses
> > >> argument? To see how it looks like... If you can import it that
> way,
> > try
> > >> str(x) to see what you have. It might help you.
> > >> 2) I've never had that much data to import, and for me read.table
> > works
> > >> well.
> > >>
> > >> You might want to wait for the experts!
> > >>
> > >> Ivan
> > >>
> > >> Le 2/11/2010 17:14, Johan Jackson a écrit :
> > >>
> > >>> Hi Ivan,
> > >>>
> > >>> Thanks for the reply. Damn IT! My original post was screwed up.
> > HERE
> > >>> is what I did:
> > >>>
> > >>> x <- read.table("data",header=TRUE,colClasses=rep('raw',600000))
> > >>> #returns error: no method or default for coercing "character"
> to
> > "raw"
> > >>>
> > >>> I've read the ?read.table and the colClasses argument. I'm still
> > >>> unclear:
> > >>>
> > >>> 1) colClasses is a character vector, is that right? That seems
> to
> > be
> > >>> what the help says, but I get an error when I do the above.
> > >>>
> > >>> 2) what is the most efficient way to read in huge amounts of
> data?
> > In
> > >>> the past I found that scan() and readLines() were slower than
> > >>> read.table.
> > >>>
> > >>> Thanks,
> > >>>
> > >>> JJ
> > >>>
> > >>>
> > >>>
> > >>>
> > >>> On Thu, Feb 11, 2010 at 8:53 AM, Ivan Calandra
> > >>> <ivan.calandra at uni-hamburg.de <mailto:ivan.calandra at uni-
> > hamburg.de>>
> > >>> wrote:
> > >>>
> > >>> Hi!
> > >>>
> > >>> |"colClasses| character. A vector of classes to be
> > assumed
> > >>> for the
> > >>> columns."
> > >>> I'm not an R expert and I don't know what your "flat file
> raw"
> > is,
> > >>> but
> > >>> the colClasses argument is to define whether the column will
> be
> > >>> treated
> > >>> as containing "factors", "logical", "integer" etc...
> > >>> For more on read.table, read the manual "R Data
> Import/Export"
> > >>> available
> > >>> on the R-project website.
> > >>>
> > >>> I don't know if it helps, but I hope it does!
> > >>>
> > >> >
> > >> > Ivan
> > >> >
> > >> > Le 2/11/2010 16:36, Johan Jackson a écrit :
> > >> > > Hi all,
> > >> > >
> > >> > > First off, it is surprising that there are no examples of
> > how to
> > >> use
> > >> > > read.table() under ?read.table !
> > >>
> > >>> >
> > >>> > I am trying to read in a flat file of type 'raw'. It has
> 1000
> > >>> rows and 600K
> > >>> > columns. I have the RAM to accomplish this, but can't get
> the
> > >>> data into R
> > >>> > using read.table:
> > >>> >
> > >>> > x<- read.table("data",header=TRUE,colClasses=rep(,600000))
> > >>> > #returns error: no method or default for coercing
> > "character"
> > >>> to "raw"
> > >>> >
> > >>> > Then I thought that maybe the colClasses vector needed to
> > >>> actually *be* the
> > >>> > mode needed (here's where an example under ?read.table
> would
> > help):
> > >>> >
> > >>> > x<-
> > read.table("data",header=TRUE,colClasses=rep(as.raw(1),600000))
> > >>> >
> > >>> > I waited on the latter command for a couple of hours before
> > >>> killing the
> > >>> > process. What should the colClasses argument be?
> > >>> >
> > >>> > Should I be using another method to read the data into R?
> > Previous
> > >>> > experience using scan() and readLines() showed that
> > read.table()
> > >>> was faster,
> > >>> > at least for those examples, so I've stopped trying to use
> > those
> > >>> other
> > >>> > functions.
> > >>> >
> > >>> > Thank you,
> > >>> >
> > >>> > JJ
> > >>> >
> > >>> > [[alternative HTML version deleted]]
> > >>> >
> > >>> > ______________________________________________
> > >>> > R-help at r-project.org <mailto:R-help at r-project.org> mailing
> > list
> > >>>
> > >> > > https://*stat.ethz.ch/mailman/listinfo/r-help
> > >>
> > >>> > PLEASE do read the posting guide
> > >>> http://*www.*R-project.org/posting-guide.html
> > >>> > and provide commented, minimal, self-contained,
> reproducible
> > code.
> > >>> >
> > >>> >
> > >>>
> > >>> [[alternative HTML version deleted]]
> > >>>
> > >>>
> > >>> ______________________________________________
> > >>> R-help at r-project.org <mailto:R-help at r-project.org> mailing
> list
> > >>> https://*stat.ethz.ch/mailman/listinfo/r-help
> > >>> PLEASE do read the posting guide
> > >>> http://*www.*R-project.org/posting-guide.html
> > >>> and provide commented, minimal, self-contained, reproducible
> > code.
> > >>>
> > >>>
> > >>>
> > >> [[alternative HTML version deleted]]
> > >>
> > >>
> > >> ______________________________________________
> > >>
> > >> R-help at r-project.org mailing list
> > >> https://*stat.ethz.ch/mailman/listinfo/r-help
> > >> PLEASE do read the posting guide http://
> > >> *www.*R-project.org/posting-guide.html
> > >> and provide commented, minimal, self-contained, reproducible code.
> > >>
> > >
> > >
> > > --
> > > --------------------------------------
> > > Don MacQueen
> > > Environmental Protection Department
> > > Lawrence Livermore National Laboratory
> > > Livermore, CA, USA
> > > 925-423-1062
> > >
> > > ______________________________________________
> > > R-help at r-project.org mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> > > http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> > >
> >
> > [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list