[R] memory issue trying to solve too large a problem using hclust

Wiener, Matthew matthew_wiener at merck.com
Thu Nov 29 18:13:30 CET 2001

Hi, all.

I'm trying to cluster 12,500 objects using hclust from package mva.  The
distance matrix takes up nearly 600 MB.  The distance matrix also needs to
be copied when being passed to the fortran routine that actually does the
clustering (it's modified during the clustering), so that's 1200 MB.  I'm
actually on a machine with 2.5 GB of memory (and nothing else running), so I
thought I could pull this off.  The routine quits with the error "cannot
allocate a vector of size 609131 KB", which by its size seems to be another
copy of the distance matrix, I think the one needed by the fortran routine.
As far as I can tell from looking at the code, no additional objects of the
size of the distance matrix are used.

After the error gc() says that the garbage collection threshold is 1433 MB.

I'm wondering whether some additional copies of the distance matrix are
being made, and whether I could somehow stop them from being made.  Any
other suggestions for how I could get around the memory problem would also
be appreciated.  (I know of clara in the "cluster" package, but would like
to use hierarchical methods.)

The function hierclust in multiv seems to demand even more memory, even when
bign = T.

I am running R-1.3.1 on Sun OS 5.6.

Thanks for any help.

Matthew Wiener
Applied Computer Science and Mathematics Department
Merck Research Labs
Rahway, NJ  07065-0900

r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch

More information about the R-help mailing list