[R] large data set, error: cannot allocate vector
Robert Citek
rwcitek at alum.calberkeley.org
Tue May 9 20:22:23 CEST 2006
On May 8, 2006, at 9:47 AM, Thomas Lumley wrote:
> On Fri, 5 May 2006, Robert Citek wrote:
>> Reloading the 10 MM dataset:
>>
>> R > foo <- read.delim("dataset.010MM.txt")
>>
>> R > object.size(foo)
>> [1] 440000376
>>
>> R > gc()
>> used (Mb) gc trigger (Mb) max used (Mb)
>> Ncells 10183941 272.0 15023450 401.2 10194267 272.3
>> Vcells 20073146 153.2 53554505 408.6 50086180 382.2
>>
>> Combined, Ncells or Vcells appear to take up about 700 MB of RAM,
>> which is about 25% of the 3 GB available under Linux on 32-bit
>> architecture. Also, removing foo seemed to free up "used" memory,
>> but didn't change the "max used":
>
> No, that's what "max" means. You need gc(reset=TRUE) to reset the
> max.
Yup, that worked (see below). The example from ?gc wasn't that clear
to me. Thanks for clarifying. I also found it informative to
compare loading data into a data.frame vs a vector.
$ cat <<eof | R -q --no-save
gc()
foo <- read.delim("dataset.010MM.txt")
gc()
rm(foo)
gc()
gc(reset=TRUE)
eof
R > gc()
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 177865 4.8 407500 10.9 350000 9.4
Vcells 72114 0.6 786432 6.0 333941 2.6
R > foo <- read.delim("dataset.010MM.txt")
R > gc()
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 10179849 271.9 15023450 401.2 10180159 271.9
Vcells 20072448 153.2 47764583 364.5 46849682 357.5
R > rm(foo)
R > gc()
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 179910 4.9 12018759 321.0 10181187 271.9
Vcells 72458 0.6 38211666 291.6 46849682 357.5
R > gc(reset=TRUE)
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 179920 4.9 9615007 256.8 179920 4.9
Vcells 72482 0.6 30569332 233.3 72482 0.6
$ cat <<eof | R -q --no-save
gc()
foo <- scan("dataset.010MM.txt")
gc()
rm(foo)
gc()
gc(reset=TRUE)
eof
R > gc()
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 177865 4.8 407500 10.9 350000 9.4
Vcells 72114 0.6 786432 6.0 333941 2.6
R > foo <- scan("dataset.010MM.txt")
Read 10000000 items
R > gc()
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 178230 4.8 407500 10.9 350000 9.4
Vcells 10072185 76.9 26713872 203.9 26456224 201.9
R > rm(foo)
R > gc()
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 178286 4.8 407500 10.9 350000 9.4
Vcells 72190 0.6 21371097 163.1 26456224 201.9
R > gc(reset=TRUE)
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 178296 4.8 407500 10.9 178296 4.8
Vcells 72214 0.6 17096877 130.5 72214 0.6
Regards,
- Robert
http://www.cwelug.org/downloads
Help others get OpenSource software. Distribute FLOSS
for Windows, Linux, *BSD, and MacOS X with BitTorrent
More information about the R-help
mailing list