Pearson Correlation Speed
Nathan S. Watson-Haigh
nathan.watson-haigh at csiro.au
Tue Dec 16 03:23:35 CET 2008
Charles C. Berry wrote:
> On Mon, 15 Dec 2008, Nathan S. Watson-Haigh wrote:
>> Nathan S. Watson-Haigh wrote:
>>> I'm trying to calculate Pearson correlation coefficients for a large
>>> matrix of size 18563 x 18563. The following function takes about XX
>>> minutes to complete, and I'd like to do this calculation about 15 times
>>> and so speed is some what of an issue.
> I think you are on the wrong track, Nathan.
> The matrix you are starting with is 18563 x 18563 and the result of
> finding the correlations amongst the columns of that matrix is also 18563
> x 18563. It will require more than 5 Gigabytes of memory to store the
> result and the original matrix.
Yes the memory usage is somewhat large - luckily I have the use of a
cluster with lots of shared memory! However, I'm interested to learn how
you came about the calculation to determine the memory requirements.
> Likely the time needed to do the calc is inflated because of caching
> issues and if your machine has less than enough memory to store the
> result and all the intermediate pieces by swapping as well.
>
> You can finesse these by breaking your problem into smaller pieces, say
> computing the correlations between each pair of 19 blocks of columns
> (columns 1:977, 977+1:977, ... 18*977+1:977 ), then assembling the
> results.
This is possibly, however why is something like this not implemented
internally in the cor() function if it poorly scales due to the large
memory requirements?
> BTW, R already has the necessary machinery to calculate the crossproduct
> matrix (etc) needed to find the correlations. You can access the low level
> linear algebra that R uses. You can marry R to an optimized BLAS if you
> like.
> So pulling in some other code to do this will not save you anything. If
> you ever do decide to import C[++] code there is excellent documentation
> in the Writing R Extensions manual, which you should review before
> attempting to import C++ code into R.
Thanks, I have seen this and it seemed quite technical to use as a
starting point for someone unfamiliar with both C++ and incorporating
C++ code into R.
Nathan
