[R] R Memory Usage Concerns

Henrik Bengtsson hb at stat.berkeley.edu
Tue Sep 15 07:01:30 CEST 2009


As already suggested, you're (much) better off if you specify colClasses, e.g.

tab <- read.table("~/20090708.tab", colClasses=c("factor", "double", "double"));

Otherwise, R has to load all the data, make a best guess of the column
classes, and then coerce (which requires a copy).

/Henrik

On Mon, Sep 14, 2009 at 9:26 PM, Evan Klitzke <evan at eklitzke.org> wrote:
> On Mon, Sep 14, 2009 at 8:35 PM, jim holtman <jholtman at gmail.com> wrote:
>> When you read your file into R, show the structure of the object:
> ...
>
> Here's the data I get:
>
>> tab <- read.table("~/20090708.tab")
>> str(tab)
> 'data.frame':   1797601 obs. of  3 variables:
>  $ V1: Factor w/ 6 levels "biz_details",..: 4 4 4 4 4 5 6 4 1 4 ...
>  $ V2: num  1.25e+09 1.25e+09 1.25e+09 1.25e+09 1.25e+09 ...
>  $ V3: num  0.0141 0.0468 0.0137 0.0594 0.0171 ...
>> object.size(tab)
> 35953640 bytes
>> gc()
>          used (Mb) gc trigger  (Mb) max used  (Mb)
> Ncells  119580  6.4    1489330  79.6  2380869 127.2
> Vcells 6647905 50.8   17367032 132.5 16871956 128.8
>
> Forcing a GC doesn't seem to free up an appreciable amount of memory
> (memory usage reported by ps is about the same), but it's encouraging
> that the output from object.size shows that the object is small. I am,
> however, a little bit skeptical of this:
>
> 1797601 * (4 + 8 + 8) = 35952020, which is awfully close to 35953640.
> My assumption is that the first column is mapped to a 32-bit integer,
> plus two 8-byte numbers for the doubles, plus a little bit of overhead
> to store whatever structs for the objects and the mapping of servlet
> name (i.e. to store the string -> int mapping used by the factor) to
> its 32-bit representation. This seems like it might be too
> conservative for me, since it implies that R allocated exactly as much
> memory for the lists as there were numbers in the list (e.g. typically
> in an interpreter like this you'd be allocating on order-of-two
> boundaries, i.e. sizeof(obj) << 21; this is how Python lists
> internally work).
>
> Is it possible that R is counting its memory usage naively, e.g. just
> adding up the size of all of the constituent objects, rather than the
> amount of space it actually allocated for those objects?
>
> --
> Evan Klitzke <evan at eklitzke.org> :wq
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>




More information about the R-help mailing list