[R] improve efficiency of a loop

Nelson Villoria nvilloria at gmail.com
Sat May 30 21:07:49 CEST 2009


Dear All:

I need advice about efficient looping/vectorization. I am trying to
bootstrap a regression model with one lag of the dependent variable in
the RHS. Specifically, let error^b_(t) be the bootstrapped error of
the regression y_(t) = gamma y_(t-1) + beta x +error_(t) at time (t),
y_(t) is the original dependent variable, and y^b_(t) the bootstraped
y_(t) using parameter estimates gamma and beta. My basic procedure is
like this:
1. Get the first y^b value using y_(1):
y^b_(2) = gamma y_(1) + beta x_(2) + error_(2).b

2. Get the other y^bs:

y^b_(3) = gamma y^b_(2) + beta x_(3) + error_(3).b
 .
 .
 y^b_(T) = gamma y^b_(t-1) + beta x_(T) + error_(4).b

however, my approach that uses a loop similar to the one below, is
extremely slow. In my actual situation I am dealing with observations
indexed over time and cross-sections, however, I thought that it was
simpler to ask my question considering only one source of variation.
Let's suppose that the dataset look like this:

> d<-data.frame(time=seq(1,100),y=rnorm(100))
> d$y.l <- c(NA,d$y[-nrow(d)])
> d$x<-rnorm(100)
> d$res <- c(NA,lm(y~y.l+x-1,d)$residuals)
> d$y.b<-NA

The parameters are:

> gamma<-coef(lm(y~y.l+x-1,d))[1]
> beta<-coef(lm(y~y.l+x-1,d))[2]

Please, for my question to make any sense, imagine that my bootstraped
errors are identical to the 'res' above, although of course, in
practice they will not be; I just want to keep things simple here:

I first get the first value of y^b_(2) using y_(1):

> d$y.b[2]<-with(d, gamma*y[1] + beta*x[2]  + res[2]) #y^b_(2) pretending that res[2] is a resampled error.

And then, fill the rest of the values using this loop:

> for(.t in 3:length(d$time)){
>       d$y.b[c(.t)]<-gamma*d$y.b[c(.t-1)] + beta*d$x[c(.t)]  + d$res[c(.t)] #pretending that res are resampled errors.
>       }

My problem is that this becomes very slow --- painfully slow when I am
actually botstraping --- when I have several observations or more than
one index (hence more looping levels) in the dependent variable. I
tried to use lapply, but it seems to me that it is not adequate when
you have a recursive situation such the one above. Any hint?

I appreciate any help,

Nelson Villoria




More information about the R-help mailing list