[R] SVD Memory Issue

Paul Hiemstra paul.hiemstra at knmi.nl
Wed Sep 14 09:31:38 CEST 2011


 Hi,

An SVD on a 771x5677 matrix should be fine, it took 30 seconds and no
memory on my workstation. The problem is most likely when you transform
the array tdm2 to a matrix. The array tdm2 has a much greater size than
771x5677, so does tdm_matrix. Without a reproducible example we cannot
help you very well. Furthermore, I have no clue as to what needs to be
extracted from tdm2 as input for the svd because I have no experience
with the tm-package.

good luck,
Paul

On 09/13/2011 10:24 AM, vioravis wrote:
> I am trying to perform Singular Value Decomposition (SVD) on a Term Document
> Matrix  I created using the 'tm' package. Eventually I want to do a Latent
> Semantic Analysis (LSA).
>
> There are 5677 documents with 771 terms (the DTM is 771 x 5677). When I try
> to do the SVD, it runs out of memory. I am using a 12GB Dual core Machine
> with Windows XP and don't think I can increase the memory anymore. Are there
> any other memory efficient methods to find the SVD?
>
> The term document is obtained using:
>
> tdm2 <-
> TermDocumentMatrix(tr1,control=list(weighting=weightTf,minWordLength=3))
> str(tdm2)
>
> List of 6
>  $ i       : int [1:6438] 202 729 737 278 402 621 654 718 157 380 ...
>  $ j       : int [1:6438] 1 2 3 7 7 7 7 8 10 10 ...
>  $ v       : num [1:6438] 8 5 6 9 5 7 5 6 5 7 ...
>  $ nrow    : int 771
>  $ ncol    : int 5677
>  $ dimnames:List of 2
>   ..$ Terms: chr [1:771] "access" "accessori" "accumul" "acoust" ...
>   ..$ Docs : chr [1:5677] "1" "2" "3" "4" ...
>  - attr(*, "class")= chr [1:2] "TermDocumentMatrix" "simple_triplet_matrix"
>  - attr(*, "Weighting")= chr [1:2] "term frequency" "tf"
>
> SVD is calcualted using:
>
>> tdm_matrix <- as.matrix(tdm2)
>> svd_out<-svd(tdm_matrix)
> Error: cannot allocate vector of size 767.7 Mb
> In addition: Warning messages:
> 1: In matrix(0, n, np) :
>   Reached total allocation of 3583Mb: see help(memory.size)
> 2: In matrix(0, n, np) :
>   Reached total allocation of 3583Mb: see help(memory.size)
> 3: In matrix(0, n, np) :
>   Reached total allocation of 3583Mb: see help(memory.size)
> 4: In matrix(0, n, np) :
>   Reached total allocation of 3583Mb: see help(memory.size)
>
>
> Thank you.
>
> Ravi
>
>
>
> --
> View this message in context: http://r.789695.n4.nabble.com/SVD-Memory-Issue-tp3809667p3809667.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


-- 
Paul Hiemstra, Ph.D.
Global Climate Division
Royal Netherlands Meteorological Institute (KNMI)
Wilhelminalaan 10 | 3732 GK | De Bilt | Kamer B 3.39
P.O. Box 201 | 3730 AE | De Bilt
tel: +31 30 2206 494

http://intamap.geo.uu.nl/~paul
http://nl.linkedin.com/pub/paul-hiemstra/20/30b/770



More information about the R-help mailing list