[R] Memory Problems with a Simple Bootstrap - Part II
jim holtman
jholtman at gmail.com
Sat Aug 2 14:56:17 CEST 2008
I was suggesting adding the gc() call to help provide some additional
information on the utilization of memory. As you indicated, it
probably do not help in reducing the fragmentation of memory, but it
was worth a try to see if there was any additional information that
might be gleaned from the execution of the code. A traceback() at the
point of the error does indicate that it was a problem with allocating
a matrix:
> per95 <- function( annual.data, b.index) {
+ sample.data <- annual.data[b.index]
+ return(quantile(sample.data,probs=c(0.95))) }
> m <- 10000
> x <- rnorm(7500,0,1)
> B <- boot(data=x,statistic=per95,R=m)
Error: cannot allocate vector of size 572.2 Mb
> traceback()
4: matrix(0, R, n)
3: ordinary.array(n, R, strata)
2: index.array(n, R, sim, strata, m, L, weights)
1: boot(data = x, statistic = per95, R = m)
You could then trace back through the 'boot' code (if you wanted) to
determine what 'n' was.
On Sat, Aug 2, 2008 at 8:04 AM, Prof Brian Ripley <ripley at stats.ox.ac.uk> wrote:
> On Sat, 2 Aug 2008, Tom La Bone wrote:
>
>> I have distilled my bootstrap problem down to this bit of code, which
>> calculates an estimate of the 95th percentile of 7500 random numbers drawn
>> from a standard normal distribution:
>>
>> library(boot)
>> per95 <- function( annual.data, b.index) {
>> sample.data <- annual.data[b.index]
>> return(quantile(sample.data,probs=c(0.95))) }
>> m <- 10000
>> x <- rnorm(7500,0,1)
>> B <- boot(data=x,statistic=per95,R=m)
>>
>> Error: cannot allocate vector of size 286.1 Mb
>>
>> This was result was observed with R 2.7.1 and 2.7.1patched when run on a
>> Windows XP computer with 4Gb of memory.
>>
>> This does not seem to be an excessively large and complicated calculation,
>> so is this an intentional limitation of the boot function, a result of bad
>> choices on my part, or a bug?
>
> Use of a 32-bit OS was a bad choice on your part. On 64-bit Linux it runs
> fine in
>>
>> gc()
>
> used (Mb) gc trigger (Mb) max used (Mb)
> Ncells 146670 7.9 350000 18.7 350000 18.7
> Vcells 3189171 24.4 168442002 1285.2 193746905 1478.2
>
> That's too much usage for a 2GB address space.
>
> boot() sets up an index array, in your case of size 7500x10000 or 600Mb.
> That dominates a 2Gb address space.
>
> What you could do is
>
> B <- replicate(10, boot(data=x,statistic=per95,R=1000), FALSE)
> Ball <- B[[1]]
> Ball$t <- do.call("rbind", lapply(B, "[[", "t"))
>
> that is, combine 10 independent runs (and that runs in ca 200Mb).
>
> BTW to Jim Holtman: adding a gc() call is not very helpful. R will run gc
> to get memory if it is running out, and whereas the pattern of gc calls can
> affect the fragmentation, it is pretty much random whether adding gc calls
> helps or hinders.
>
>
> --
> Brian D. Ripley, ripley at stats.ox.ac.uk
> Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
> University of Oxford, Tel: +44 1865 272861 (self)
> 1 South Parks Road, +44 1865 272866 (PA)
> Oxford OX1 3TG, UK Fax: +44 1865 272595
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Jim Holtman
Cincinnati, OH
+1 513 646 9390
What is the problem that you are trying to solve?
More information about the R-help
mailing list