William Dunlap
wdunlap at tibco.com
Tue Apr 22 03:20:00 CEST 2014
> For me that other software would probably be Octave. I'm interested if
> anyone here has read in these files using Octave, or a C program or
> anything else.
I typed 'octave read binary file' into google.com and the first hit was
the Octave help file for its fread function. In C fread is also a good way
to go (C and Octave have different argument lists for their fread functions.)
In the Linux shell you can use the od command.
% R --quiet
> con <- gzcon(file("/tmp/file.gz", "wb")) # your gzcon("/tmp/file.gz", "wb") resulted in an error message
> writeBin(c(121:130,129:121), con, size=2)
> close(con)
> q("no")
% zcat /tmp/file.gz | od --format d2
0000000 121 122 123 124 125 126 127 128
0000020 129 130 129 128 127 126 125 124
0000040 123 122 121
Bill Dunlap
TIBCO Software
wdunlap tibco.com
> After saving a file like so...
> con <- gzcon("file.gz", "wb"))
> writeBin(vector, con, size=2)
> close(con)
> I can read it back into R like so...
> con <- gzcon("file.gz", "rb"))
> vector <- readBin(con, integer(), 48000000, size=2, signed=FALSE)
> close(con)
> ...and I'm wondering what other programs might be able to read in these
> data. It seems to be very straightforward: When I store 5436 integers
> for each of 7694 subjects, at two bytes per integer that ought to be
> 5436*7696*2 = 83670912 bytes, and it is exactly that:
> $ zcat file.gz | wc -c
> 83670912
> So if I just convert every pair of bytes to an integer, I guess that will
> do it. I stored them this way because it was compact, but I guess this
> system also can work well when other software needs to read the data.
> For me that other software would probably be Octave. I'm interested if
> anyone here has read in these files using Octave, or a C program or
> anything else. If I don't get a good answer here, I'll try the Octave
> list, and I'll send my best answers here.
> The rest of this is some related info for readers of this list. You don't
> need to read below to answer my question above. Thanks.
> In case anyone is interested, I did some comparisons of loading speed and
> file size for a number of ways of storing my data. These data all consist
> of positive numbers between 0 and 2, with three digits to the right of the
> decimal, so I can save them as floating point double-precision, or
> multiply by 1000 and store them as integers. The test here as for a
> matrix of 5000 x 7845 = 39,225,000 values. These are the file sizes:
> 202.1 MB tab-delimited text file, original, uncompressed
> 29.9 MB tab-delimited text file, original, gzip compressed
> 187.7 MB tab-delimited text file, integers, uncompressed
> 24.6 MB tab-delimited text file, integers, gzip compressed
> 38.9 MB R save() original numeric values (doubles)
> 27.0 MB R save() integers
> 19.7 MB R writeBin() 16-bit integer gzipped
> So, for file size (important in my case), the gzipped writeBin() method
> storing 16-bit integers was the winner. Impressively, storing the data
> that way and dividing by 1000 on the fly to return the original numbers
> was faster than reading an Rdata file of the matrix:
> The integer text file:
> > system.time( D <- matrix( scan( file = "D/D000", what=integer(0) ), ncol=7845,
> byrow=TRUE ) )
> Read 39225000 items
> user system elapsed
> 10.626 0.344 10.971
> The R save() original numeric values (doubles):
> > system.time( load("D000_test.Rdata") )
> user system elapsed
> 5.579 0.119 5.698
> The R save() integers:
> > system.time( load("D000_test.Rdata") )
> user system elapsed
> 4.863 0.050 4.913
> The writeBin() 16-bit integer gzipped file:
> > con <- gzcon(file("D000_test.gz", "rb"))
> > system.time( D <- matrix( readBin( con, integer(), 7845*5000, size=2, signed=FALSE ),
> ncol=7845, byrow=TRUE ) )
> user system elapsed
> 3.769 0.138 3.906
> > close(con)
> The writeBin() 16-bit integer gzipped file, converted to numeric by
> dividing by 1000 on the fly:
> > system.time( D <- matrix( readBin( con, integer(), 7845*5000, size=2, signed=FALSE ),
> ncol=7845, byrow=TRUE )/1000 )
> user system elapsed
> 4.159 0.237 4.397
> > close(con)
