PCA with n << p (was [R] R-1.6.0 crashing on RedHat6.3)

Douglas Grove dgrove at fhcrc.org
Tue Oct 29 07:58:51 CET 2002


> princomp is the wrong tool here: prcomp is better (and a version using
> La.svd would be better still).

Okay. I used princomp as I saw something in your book regarding using
this and NOT prcomp in S-plus, I thought the same would hold in R.  

 

> What do you want to do with a PCA of such a matrix?  We can almost
> certainly give you a better way using La.svd directly.

Looking at microarray data with approx. 5300 'genes' and 144 samples
(72 control, 72 experimental).  Trying to see how well principal
components can separate the different tissue types (Liver, Kidney, Testis)
making up the samples.

In any case, this code is just an example of the problem we're having
with R (and S-plus) crashing with memory allocation related errors.
I used this code since no one could poke holes in it :-)

Thanks for the info on prcomp.  I'll try it out.

Doug



> BDR
> 
> On 28 Oct 2002, Peter Dalgaard BSA wrote:
> 
> > Douglas Grove <dgrove at fhcrc.org> writes:
> >
> > > > Are you sure that it is 6.3?? To my knowledge, there is nothing
> > > > between 6.2 and 7.0. What's in /etc/redhat-release ?
> > >
> > > Sorry, I was told it was running 6.3. I just checked and
> > > it's running 6.2.
> >
> > OK, so the Fortran problems are there, but the usual symptom of that
> > is crash-on-load.
> >
> >
> > > > The Fortran in RH6.x was rather badly broken for some packages, but
> > > > one would expect that you had run into that before. 1.6.0 has a memory
> > > > leak but it generally affects repeated applications of model fits,
> > > > rather than big matrices.
> > > >
> > > > Do you really mean 144x5300 ? (more columns than rows) That's big: The
> > > > covariance matrix at 5300x5300 will take more than 200 MB (OK, it
> > > > might only be storing upper or lower triangle.) I tried a matrix like
> > > > that on a 1.6.1beta system with about 0.75 GB and got an out of memory
> > > > error. A 144x2500 problem is currently running in
> > > >
> > > >   PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME COMMAND
> > > > 16111 pd        17   0  207M 163M 18536 R    99.8 65.8   1:58 R.bin
> > > >
> > > > and seems to be staying there....
> > >
> > > Yep, it's 144x5300.  The machine has 2GB of RAM, and this uses about 1.5GB.
> >
> > Hmm. My half-size toy version conked out with
> >
> > > D <- matrix(rnorm(2500*144),ncol=2500)
> > > library(mva)
> > > pc.norm <- princomp(D,scores=FALSE)
> > Error in princomp.default(D, scores = FALSE) :
> >         covariance matrix is not non-negative definite
> >
> > which is a bit odd, but at least it didn't run out of memory. However,
> > the tolerances seem to require some tweaking!
> >
> > --
> >    O__  ---- Peter Dalgaard             Blegdamsvej 3
> >   c/ /'_ --- Dept. of Biostatistics     2200 Cph. N
> >  (*) \(*) -- University of Copenhagen   Denmark      Ph: (+45) 35327918
> > ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)             FAX: (+45) 35327907
> > -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
> > r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
> > Send "info", "help", or "[un]subscribe"
> > (in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
> > _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
> >
> 
> 

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._



More information about the R-help mailing list