[R] R Memory Usage Concerns

Carlos J. Gil Bellosta cgb at datanalytics.com
Tue Sep 15 21:48:27 CEST 2009


I do not know whether my package "colbycol" may help you. It can help
you read files that would not have fitted into memory otherwise.
Internally, as the name indicates, data is read into R in a column by
column fashion. 

IO times increase but you need just a fraction of "intermediate memory"
to read the files.

Best regards,

Carlos J. Gil Bellosta

On Tue, 2009-09-15 at 00:10 -0700, Evan Klitzke wrote:
> On Mon, Sep 14, 2009 at 10:01 PM, Henrik Bengtsson <hb at stat.berkeley.edu> wrote:
> > As already suggested, you're (much) better off if you specify colClasses, e.g.
> >
> > tab <- read.table("~/20090708.tab", colClasses=c("factor", "double", "double"));
> >
> > Otherwise, R has to load all the data, make a best guess of the column
> > classes, and then coerce (which requires a copy).
> Thanks Henrik, I tried this as well as a variant that another user
> sent me privately. When I tell R the colClasses, it does a much better
> job of allocating memory (ending up with 96M of RSS memory, which
> isn't great but is definitely acceptable).
> A couple of notes I made from testing some variants, if anyone else is
> interested:
>  * giving it an nrows argument doesn't help it allocate less memory
> (just a guess, but maybe because it's trying the powers-of-two
> allocation strategy in both cases)
>  * there's no difference in memory usage between telling it a column
> is "numeric" vs "double"
>  * when telling it the types in advance, loading the table is much, much faster
> Maybe if I gather some more fortitude in the future, I'll poke around
> at the internals and see where the extra memory is going, since I'm
> still curious where the extra memory is going. Is that just the
> overhead of allocating a full object for each value (i.e. rather than
> just a double[] or whatever)?

More information about the R-help mailing list