[R] very fast OLS regression?
Ravi Varadhan
rvaradhan at jhmi.edu
Wed Mar 25 23:13:24 CET 2009
Yes, Bert. Any least-squares solution that forms X'X and then inverts it is not to be recommended. If X is nearly rank-deficient, then X'X will be more strongly so. The QR decomposition approach in my byhand.qr() function is reliable and fast.
Ravi.
____________________________________________________________________
Ravi Varadhan, Ph.D.
Assistant Professor,
Division of Geriatric Medicine and Gerontology
School of Medicine
Johns Hopkins University
Ph. (410) 502-2619
email: rvaradhan at jhmi.edu
----- Original Message -----
From: Bert Gunter <gunter.berton at gene.com>
Date: Wednesday, March 25, 2009 6:03 pm
Subject: Re: [R] very fast OLS regression?
To: 'ivo welch' <ivowel at gmail.com>, 'Dimitris Rizopoulos' <d.rizopoulos at erasmusmc.nl>
Cc: 'r-help' <r-help at stat.math.ethz.ch>
> lm is slow because it has to set up the design matrix (X) each time. See
> ?model.matrix and ?model.matrix.lm for how to do this once
> separately from
> lm (and then use lm.fit).
>
> I am far from an expert on numerical algebra, but I'm pretty sure your
> comments below are incorrect in the sense that "naive" methods are **not**
> universally better then efficient numerical algebra methods
> using,say, QR
> decompositions. It will depend very strongly on the size and specific
> nature
> of the problem. Big Oh Complexity statements ( O(n) or O(n^2), etc.)
> are,
> after all, asymptotic. There is also typically a tradeoff between
> computational accuracy (especially for ill-conditioned matrices) and
> speed
> which your remarks seem to neglect.
>
> -- Bert
>
>
> Bert Gunter
> Genentech Nonclinical Biostatistics
> 650-467-7374
>
> -----Original Message-----
> From: r-help-bounces at r-project.org [ On
> Behalf Of ivo welch
> Sent: Wednesday, March 25, 2009 2:30 PM
> To: Dimitris Rizopoulos
> Cc: r-help
> Subject: Re: [R] very fast OLS regression?
>
> thanks, dimitris. I also added Bill Dunlap's "solve(qr(x),y)"
> function as ols5. here is what I get in terms of speed on a Mac Pro:
>
> ols1 6.779 3.591 10.37 0 0
> ols2 0.515 0.21 0.725 0 0
> ols3 0.576 0.403 0.971 0 0
> ols4 1.143 1.251 2.395 0 0
> ols5 0.683 0.565 1.248 0 0
>
> so the naive matrix operations are fastest. I would have thought that
> alternatives to the naive stuff I learned in my linear algebra course
> would be quicker. still, ols3 and ols5 are competitive. the
> built-in lm() is really problematic. is ols3 (or perhaps even ols5)
> preferable in terms of accuracy? I think I can deal with 20% speed
> slow-down (but not with a factor 10 speed slow-down).
>
> regards,
>
> /iaw
>
>
> On Wed, Mar 25, 2009 at 5:11 PM, Dimitris Rizopoulos
> <d.rizopoulos at erasmusmc.nl> wrote:
> > check the following options:
> >
> > ols1 <- function (y, x) {
> > coef(lm(y ~ x - 1))
> > }
> >
> > ols2 <- function (y, x) {
> > xy <- t(x)%*%y
> > xxi <- solve(t(x)%*%x)
> > b <- as.vector(xxi%*%xy)
> > b
> > }
> >
> > ols3 <- function (y, x) {
> > XtX <- crossprod(x)
> > Xty <- crossprod(x, y)
> > solve(XtX, Xty)
> > }
> >
> > ols4 <- function (y, x) {
> > lm.fit(x, y)$coefficients
> > }
> >
>
> ______________________________________________
> R-help at r-project.org mailing list
>
> PLEASE do read the posting guide
> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at r-project.org mailing list
>
> PLEASE do read the posting guide
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list