[R] R died on large data set
Henrik Bengtsson
hb at stat.berkeley.edu
Sat Feb 20 14:04:20 CET 2010
Some suggestions:
The line:
pearson.dist <- as.dist(1-cor(t(todos.norm), method="pearson"))
includes several data manipulations in "one go". Each manipulation
creates at least one extra copy of your data in memory. When you do
it this, you make it harder for the R garbage collector to clean out
such memory.
The following should use less memory:
todos.norm <- t(todos.norm);
gc(); # Explicit garbage collect; cleans out the 1st 'todos.norm' object.
rho <- cor(todos.norm, method="pearson");
rm(todos.norm); # Not needed anymore
gc(); # Explicit garbage collect; cleans out the 2nd 'todos.norm' object.
rho <- 1-rho;
gc(); # Explicit garbage collect; cleans out the 1st 'rho' object.
pearson.dist <- as.dist(rho);
Not sure if it helps in your case/with your data, but this is how you
are a user can help R at bit on the way.
You should of course also clean out all other stray objects you don't
use anymore, before doing the above.
My $.02
/Henrik
On Sat, Feb 20, 2010 at 1:13 PM, Marcelo Laia <marcelolaia at gmail.com> wrote:
> Hi, I am trying to run a script on R and it died before finish.
>
> I already read the list archives, and memory help pages
> (http://tinyurl.com/yaxco6w), but I am unable to solve the issue.
>
> My Debian shows:
>
> marcelo at laia:~$ ulimit
> unlimited
> marcelo at laia:~$
>
> On system monitor (gnome) I see that R reaches 1.9 Gb, before die.
>
> The R code is:
>
>> ls() ## only todos.norm object are listed
> [1] "todos.norm"
>> dim(todos.norm)
> [1] 9600 15
>>
>> library("cluster")
>> pearson.dist <- as.dist(1-cor(t(todos.norm), method="pearson"))
> Died
>
> What I could do to solve my problem?
>
>> sessionInfo() ## after restart R
> R version 2.10.1 (2009-12-14)
> i486-pc-linux-gnu
>
> locale:
> [1] LC_CTYPE=pt_BR.UTF-8 LC_NUMERIC=C
> [3] LC_TIME=pt_BR.UTF-8 LC_COLLATE=pt_BR.UTF-8
> [5] LC_MONETARY=C LC_MESSAGES=pt_BR.UTF-8
> [7] LC_PAPER=pt_BR.UTF-8 LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=pt_BR.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>>
>
> My system:
>
> Linux laia 2.6.32-trunk-686 #1 SMP Sun Jan 10 06:32:16 UTC 2010 i686 GNU/Linux
>
> Than you very much!
>
> --
> Marcelo Luiz de Laia
> Brazil
> Linux user number 487797
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list