[R] finding a faster way to run lm on rows of predictor matrix

Fri Jul 29 19:58:48 CEST 2011

Thank you so much for the help!
I got it down to 1sec per run.

-Chris

On Fri, Jul 29, 2011 at 1:12 PM, "Dénes TÓTH" <tdenes at cogpsyphy.hu> wrote:
>
> Hi,
>
> you can solve the task by simple matrix algebra.
> Note that I find it really inconvenient to have a matrix with variables in
> rows and cases in columns, so I transposed your predictors matrix.
>
> # your data
> regress.y = rnorm(150)
> predictors = matrix(rnorm(6000*150), ncol=150, nrow=6000)
> tpreds <- t(predictors)
>
> # compute coefficients
> coefs <- apply(tpreds,2,function(x)
> solve(crossprod(x),crossprod(x,regress.y)))
>
> # compute residuals
> resids <- regress.y - sweep(tpreds,2,coefs,"*")
>
> (Note that resids shall be transposed back if you really insist on your
> original matrix format.)
>
> HTH,
>  Denes
>
>
>> Hi, everyone.
>> I need to run lm with the same response vector but with varying predictor
>> vectors. (i.e. 1 response vector on each individual 6,000 predictor
>> vectors)
>> After looking through the R archive, I found roughly 3 methods that has
>> been suggested.
>> Unfortunately, I need to run this task multiple times(~ 5,000 times) and
>> would like to find a faster way than the existing methods.
>> All three methods I have bellow run on my machine timed with system.time
>> 13~20 seconds.
>>
>> The opposite direction of 6,000 response vectors and 1 predictor vectors,
>> that is supported with lm runs really fast ~0.5 seconds.
>> They are pretty much performing the same number of lm fits, so I was
>> wondering if there was a faster way, before I try to code this up in c++.
>>
>> thanks!!
>>
>> ## sample data ###
>> regress.y = rnorm(150)
>> predictors = matrix(rnorm(6000*150), ncol=150, nrow=6000)
>>
>> ## method 1 ##
>> data.residuals = t(apply(predictors, 1, function(x)( lm(regress.y ~ -1 +
>> as.vector(x))$residuals)))
>>
>> user  system elapsed
>>  15.076   0.048  15.128
>>
>> ## method 2 ##
>> data.residuals = matrix(rep(0, nrow(predictors) * ncol(predictors)),
>> nrow=nrow(predictors), ncol=ncol(predictors) )
>>
>> for( i in 1:nrow(predictors)){
>>     pred = as.vector(predictors[i,])
>>     data.residuals[i, ] = lm(regress.y ~ -1 + pred )$residuals
>> }
>>
>>  user  system elapsed
>>  13.841   0.012  13.854
>>
>> ## method 3 ##
>> library(nlme)
>>
>> all.data <- data.frame( y=rep(regress.y, nrow(predictors)),
>> x=c(t(predictors)), g=gl(nrow(predictors), ncol(predictors)) )
>> all.fits <- lmList( y ~ x | g, data=all.data)
>> data.residuals = matrix( residuals(all.fits), nrow=nrow(predictors),
>> ncol=ncol(predictors))
>>
>> user  system elapsed
>>  36.407   0.484  36.892
>>
>>
>> ## the opposite direction, supported by lm ###
>> lm(t(predictors) ~ -1 + regress.y)$residuals
>>
>>  user  system elapsed
>>  0.500   0.120   0.613
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>