[R] R/S and large datasets - Database access (also Re: SAS and S/R)

Emmanuel Charpentier charpent at bacbuc.dyndns.org
Tue Nov 27 16:11:47 CET 2001

A consensus seems to emerge : R would excel to exploratory work on 
small/middle-sized datasets, while SAS would be able to munch much 
larger datasets.

However, I see the "size" problem as a red herring. The objects that 
have to stay "in core" are usually much smaller than the dataset. For 
example, for problems involving fixed-effects linear models, you need 
only some matrices whose size is proportional to the square of the 
number of *variables* and the (admittedly large) vector of residues 
(whose size is equl to the number of observations). Other cases 
(nonlinear mixed effects models come to mind) are not as easily tamed 
(any iterative process (shuch as ML estimation) has to get back  to 
original data), but at least, the time penalty involved in the use of 
such an interface pays back by allowing you to treat problems otherwise 

I am aware of at least one database access package that allows to access 
data without dragging a whole table in memory : the RPgSql package 
offers what it calls a "proxy variable", which is an objet that behaves, 
for all practical purposes, as a dataframe, but is an interface to 
database tables. I see this kind of interface as a way to avoid 
overloading core memory with data scarcely used.

Unfortunately, the said package is now officially orphaned by its 
developper, which states that he now focuses on the next database access 
standard : the Rdbi interface, which is currently under development, and 
which I don't know a thing about.

So the question is : do the Rdbi interface offers such a proxy to data 
still residing in databases ?

Or am I barking up the wrong tree and trying to (re-)invent an 
oversophisticated virtual memory manager ?  SShould the use of a 
suficiently large swapfile be enough for these "large dataset" problems ?

                                        Emmanuel Charpentier

r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch

More information about the R-help mailing list