[R] R for large data

Micheall Taylor pols1oh at bestweb.net
Wed Jul 11 20:20:54 CEST 2001

I am trying to gain an understanding of R's capabilities in larger data set
analysis.  I really like R, but the datasets that I normally work with are
in the 15m-50m range, sometimes much larger.  The size owes to observations, not extraneous
variables, so little can be done to "clean" the data of unnecessary
elements. (i.e. database storage or external data manipulation doesn't get
me very far)

Over the past couple of years I've used Stata (prior to that SAS, etc). I
have 2 gigs of memory, but R seems pretty slow to load relatively modest
datasets of say 10-30 megs. Much slower to load than say Stata. For
comparison, stats on loading a 32 meg datafile:

R - 5.3 minutes
Stata - 31 secs
SPSS - 42 secs
SAS - 21 secs

I normally start R with the command line switch allowing it to use 600megs
or so - stata is allocated 200 megs.  I've allocated 1.5 gigs to stata
before so I assume my memory management isn't an issue.

Does anyone have any pointers to documents which discuss R limitations?

Could there be something wrong with my particular R installation (RH 7.1 and
most recent stable R release., 2 gigs memory, enterprise kernel, dual
processor 800 mghrtz, high performance scsi drives)

Thanks in advance.


r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch

More information about the R-help mailing list