[R] Huge data frames?

ripley@stats.ox.ac.uk ripley at stats.ox.ac.uk
Wed Aug 28 08:50:20 CEST 2002

On Wed, 28 Aug 2002, Magnus Lie Hetland wrote:

> A friend of mine recently mentioned that he had painlessly imported a
> data file with 8 columns and 500,000 rows into matlab. When I tried
> the same thing in R (both Unix and Windows variants) I had little
> success. The Windows version hung for a very long time, until I
> eventually more or less ran out of virtual memory; I tried to set the
> proper memory allocations for the Unix version, but it never seemed
> satisfied :]

That's not big: if numeric it is a 32Mb object.  People do do that quite
often (on machines with 512Mb or more, but memory is cheap).  So it is
hard to know what the problem is, but ?read.table gives some hints
(including using scan()).

I've just done an experiment. I generated 4m rnorms, made a matrix,
wrote them out.  Then.

AA <- read.table("foo.dat", nrows=5e5, comment.char="",
                 colClasses=rep("numeric", 8), header=T)

worked for me in about 20secs, using less than 150Mb.

That was painless, and all the speed-ups are documented in ?read.table.

> I used read.table -- should I have used something else? Is it even
> possible to work with this large files? I assume a memory-mapped
> binary file would have been quite efficient (as opposed to an
> in-memory parsed text file) -- is something like that even possible in
> R?

Certainly possible to read binary files. That's what load/save do,
and see ?readBin to read binary files written by other formats.
Having a file that size in memory is not a problem.  Doing useful
analyses may be (especially in Matlab).

Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272860 (secr)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch

More information about the R-help mailing list