[R] Using predict.lm()

Thu Jun 17 16:04:19 CEST 2004

Greetings,

Following the example in help(predict.lm):

     x <- rnorm(15)
     y <- x + rnorm(15)
     new <- data.frame(x = seq(-3, 3, 0.5))
     predict(lm(y ~ x), new)

predicts the response elements corresponding to new$x as can be viewed by:

     plot(x,y)
     lines(new$x,predict(lm(y ~ x), new))

I am trying to extend this fitting and prediction over a variety of factors as 
follows:

     f<-rep(c("FIRST","SECOND"),each=15)
     f<-as.factor(f)
     x<-rep(rnorm(15),2)
     y<-x+rnorm(length(x))
     old<-data.frame(f=f,x=x,y=y)
     new<-data.frame(f=rep(levels(f),each=length(seq(-4,4,0.2))),x=seq(-4,4,0.2))

...where variable new simply substitutes a differing domain than old. When I 
try to predict on the frame new using x & y, I get a response that 
corresponds to the length of new:

     predict(lm(y~x),new)

but when I use the same variables from within the frame old, the frame new is 
ignored:

     predict(lm(old$y~old$x),new)

...results in a response the length of old$x (presumably predicting over the 
values of old$x). Furthermore, this behavior also precludes using something 
more useful, i.e.:

     predict(lm(old$y~old$f/(1+old$x)-1),new)

to return predictions over a number of factors over redefined domains. In my 
case, I am attempting to do 2nd order polynomial fitting over noisy data 
collected for a large number of factors (~85). The data were collected for 
each factor at convenient (and therefore dissimilar) points within a common 
domain, but I need to compare the responses of each factor at similar points 
within the common domain.

I am obviously missing something here because I continue to be puzzled by the 
result. I had thought (perhaps erroneously) that lm() would return a model 
object that would permit prediction. Indeed:

     lm(old$y~old$f/(1+old$x)-1)

...results in:

Call:
lm(formula = old$y ~ old$f/(1 + old$x) - 1)

Coefficients:
       old$fFIRST        old$fSECOND   old$fFIRST:old$x  old$fSECOND:old$x
         -0.08489           -0.05839            1.15351            0.72981

which clearly provides a model fit for each factor, and identifies the factor 
from which each model coefficient was extracted, so lm() does provide the 
capability to predict over the factors. It seems however (as nearly as I can 
tell), that predict simply ignores the frame new altogether, failing even to 
provide a warning.

Is this the intended behavior? Have I missed something very simple or have a 
fundamental misunderstanding of how this should work? Lastly, I'd appreciate 
any suggestions that avoid the lengthy and wholly undesirable "brute force" 
approach I an now considering.

Thanks & Best Regards,
Steve