[R] cor.test() running out of memory on 64-bit system
    Alex Reynolds 
    reynolda at uw.edu
       
    Fri Jun  3 13:37:45 CEST 2011
    
    
  
I am running into resource issues with calculating correlation scores with cor.test(), on R 2.13.0:
  R version 2.13.0 (2011-04-13) ...
  Platform: x86_64-unknown-linux-gnu (64-bit)
In my test case, I read in a pair of ~150M vectors from text files using the pipe() and scan() functions, which pull in a specific column of numeric values from a text file. Once I have the two vectors, I run cor.test() on them.
If I run this on our compute cluster (running SGE), I have the option of setting hard limits on the memory assigned to the compute slot or node that my R task is sent to (this is done to keep R from grabbing so much memory from the compute cluster that other non-R tasks stall and fail). 
If I set hard limits (h_data and h_vmem) under 8 GB, then the R task finishes early with the following R error:
  Error: cannot allocate vector of size 2.0 Gb
What is confusing to me is that I have a 64-bit version of R, and so I should be able to use hard limits of 4GB (or, say, 5GB, if I make a generous assumption of 1GB of overhead) for this particular input size (2 GB x 2 vectors -- plus, say, 1GB of overhead).
What seems to be the case is that the overhead is closer to 4 GB in size, itself, in addition to the 4 GB for the two input vectors, based on hard limits. If my hard limits are under 8 GB, then the job fails. 
Does cor.test() really require this much extra space, or have I missed some compilation or other magic setting that addresses this aspect of running cor.test()?
Thanks for your advice.
Regards,
Alex
    
    
More information about the R-help
mailing list