[R] Performance & capacity characteristics of R?

Thomas Vogels tov at infiniti.ece.cmu.edu
Tue Aug 3 16:04:46 CEST 1999

"Brian" == Prof Brian D Ripley <ripley at stats.ox.ac.uk> writes:

Brian> Can you tell us what statistical procedures need 1 million to 100s of
Brian> millions or rows (observations)?  Some of us have doubted that there are
Brian> even datasets of 100,000 examples that are homogeneous and for which a
Brian> small subsample would not give all the statistical information. (If they
Brian> are not homogeneous, one could/should analyse homogeneous subsets and do a
Brian> meta-analysis.)

What if your problem is to find the outliers in a dataset?  It would
be nice to examine the (homogeneous part of) the dataset and then
search for the data entries "that don't quite fit in" without having
to leave R and going to Perl or other home-grown software.

I'm looking at data from experiments in the semi-conductor industry.
It's not uncommon for us to have e.g. parametric measurements
available for a lot of integrated circuits (even > 100,000) and it
would be nice to read them into R (maybe one set of measurements at a


mailto:tov at ece.cmu.edu (Tom Vogels)   Tel: (412) 268-6638   FAX: -3204

r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch

More information about the R-help mailing list