[BioC] Computing large correlations in R
    Paul [guest] 
    guest at bioconductor.org
       
    Mon Sep 30 08:57:07 CEST 2013
    
    
  
I have two list of lists A and B, A and B contain 100 data frames each and the dimension of each data frame is 15000 X 15000. I would like to find the correlation for the entire data frame in the following way: Consider the first list in both lists and find cor (A,B) and get a single value correlating the entire dataframe. Similarly consider the second list in both lists and find cor(A,B) and continue this for the 100 dataframes.
I tried the following:
      A # list of 100 dataframes
      B #list of 100 dataframes
      C<- A[1] # extract only the first list from A
      D<- B[1] # extract only the first list from B
      C<-unlist(C) ### unlist C
      D<-unlist(D) ## unlist D
Then computed
      
       Correlation<- cor(C,D) ## to obtain a single correlation coefficient to see how these two vectors are correlated         
But I end up with the error sayin 
      R cannot allocate a vector of size 3.9 GB
Is there a better way to do this in faster way which could be implemented to the entire list. I work on a server which allows me to compute large values but it still shows up this error and the unlisting takes ages because of the size of the dataframe.
   
 -- output of sessionInfo(): 
R version 3.0.1 (2013-05-16)
Platform: x86_64-redhat-linux-gnu (64-bit)
locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=C                 LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     
loaded via a namespace (and not attached):
[1] tools_3.0.1
--
Sent via the guest posting facility at bioconductor.org.
    
    
More information about the Bioconductor
mailing list