[R] readBin fails to read large files
Matt Shotwell
matt at biostatmatt.com
Thu Sep 1 19:36:25 CEST 2011
On Thu, 2011-09-01 at 17:36 +0100, Prof Brian Ripley wrote:
> readBin is intended to read a few items at a time, not 10^9. You are
> probably getting 32-bit integer overflow inside your OS, since the
> number of bytes you are trying to read in one go exceeds 2GB.
>
> Don't do that: read say a million at time.
>
> And BTW, if these really are unsigned ints you will get wraparound.
To elaborate, ?readBin reads that the 'signed' argument is only used for
integers of size 1 and 2 bytes. These are ultimately converted to signed
4 byte integers, because that's how R stores integers. To be exact, if
your file contains integers larger than 2^31-1 = 2147483647, would
occur. In actuality, R returns NA for those values.
I'm bringing this up because R normally issues a warning:
R> 2147483647L + 1L
[1] NA
Warning message:
In 2147483647L + 1L : NAs produced by integer overflow
But, a similar warning isn't issued by readBin when NA results from
signed integer overflow:
#The raw vector below represents 2147483647L and 2147483647L + 1L
#in little endian, unsigned, 4 byte integers
R> dat <- as.raw(c(0xff,0xff,0xff,0x7f,0x00,0x00,0x00,0x80))
R> writeBin(dat, 'test.bin')
R> readBin('test.bin', n=2, integer(), signed=FALSE)
[1] 2147483647 NA
> On Thu, 1 Sep 2011, Benton, Paul wrote:
>
> > Posting for a friend
> >
> > Begin forwarded message:
> >
> > From: "Geier, Florian" <florian.geier08 at imperial.ac.uk<mailto:florian.geier08 at imperial.ac.uk>>
> > Subject: Fwd: readBin fails to read large files
> > Date: September 1, 2011 4:10:53 PM GMT+01:00
> > To:
> >
> >
> >
> > Begin forwarded message:
> >
> > Date: 1 September 2011 16:01:45 GMT+01:00
> > Subject: readBin fails to read large files
> >
> > Dear all,
> >
> > I am trying to read a large file (~2GB) of unsigned ints into R. Using the command:
> >
> > raw<-readBin("file",n=10^8, integer(),endian="little",signed=FALSE)
> >
> > It works fine for n=10^8, but fails for n=10^9 (or even at n=6*10^8). My machine$sizeof.long is 8 bit.
> > I am running R 2.13.1 on a x86_64-apple-darwin9.8.0/x86_64 (64-bit) architecture.
> >
> > Thanks for your help
> >
> > Florian
> >
> > --
> > AXA doctoral fellow
> > Bundy lab - Biomolecular Medicine
> > Imperial College London
> >
> >
> >
> >
> >
> > --
> > AXA doctoral fellow
> > Bundy lab - Biomolecular Medicine
> > Imperial College London
> >
> >
> >
> >
> >
> >
> > [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
More information about the R-help
mailing list