[R] large object disorientation
Thomas Lumley
thomas at biostat.washington.edu
Tue Nov 21 22:09:59 CET 2000
On Tue, 21 Nov 2000, Roger Koenker wrote:
> This is an inquiry for all those who have been working on external
> data base applications. I sent an inquiry (below) to snews about
> this sort of thing a couple of years ago and eventually decided that
> I would wait to see what external database developments occurred and
> then revisit the problem. I hope that foundations are now better.
>
> Suppose for the sake of concreteness you have a large dataframe-like
> object stored in some compressed format (e.g. I have a 48Mb stata
> dataset that is about 2.5 million observations on about 40 variables.)
> and you would like to do lm() fitting. That is you would like to
> specify that the data frame is somehow external, and using the formula
> specification in lm() generate a sequence of queries that would return
> chunks of rows of the dataframe, accumulate X'X and X'y, do Major
> Cholesky's solve, and return. All with a modest memory requirement
> and in the blink of the cpu's eye. I realize that it sounds a bit
> retrograde to be doing least squares computations like this, but if
> there were a good way to do this, then there would be good ways to
> do lots of other more interesting things too, I believe.
I don't know if this is relevant/useful, but there is Fortran code as part
of the "leaps" package to do linear regression in bounded memory using a
QR decomposition (less retrograde).
-thomas
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
More information about the R-help
mailing list