[R] forward stepwise selection

Prof Brian Ripley ripley at stats.ox.ac.uk
Wed Jun 7 11:37:40 CEST 2000


> Date: Wed, 07 Jun 2000 09:58:57 +0100
> From: "Simon  Bond" <bond at graylab.ac.uk>
> Subject: [R] forward stepwise selection
> 
> Dear R-Help, 
> 
> My problem/bug came to light,when fitting a linear model using stepwise
> selection. I'd started  with the straightfoward command
> 
> step(lm(y~., dataset))
> 
> This worked fine, but because this starts with  all the possible
> explanatory variables, it results in a model with too many explanatory
> variables. Hence I wanted to start with just a constant and do forward
> selection, to get a new starting model for full stepwise selection again.
> But R (version 0.99.0) doesn't like this.

Please use a current version of R, and in particular please use a
non-beta version of R.  Your logic is not very sound (take another look
at your MSc notes on model selection), and I suggest a better approach
is to increase k in step or to use drop1(, test="F") repeatedly to
reduce the model.  Remember AIC is attempting good prediction, not
good explanation.

[...]

> step(lm(ANB.DIFF~1,tink4),scope=list(lower=~1,upper=fmla),direction="forward")
> Start:  AIC= 25.35 
>  ANB.DIFF ~ 1 
> 
> Error in lm.fit(X, y) : incompatible dimensions
> > 
> 
> 
> 
> I've narrowed it down to the command add1(), which uses lm.fit(), but the
> way add1() constructs X and y, is undecipherable. Any advice would be much
> appreciated.

traceback() would have told you immediately where it came from,
and running debug(add1.lm) would enable you to track this down
further.

I cannot reproduce this in 1.0.1:

> library(MASS)
> data(hills)
> step(lm(time ~1, hills), scope=list(lower=~1,upper=~dist+climb), 
direction="forward")
Start:  AIC= 274.88 
 time ~ 1 

        Df Sum of Sq   RSS   AIC
+ dist   1     71997 13142   211
+ climb  1     55205 29934   240
<none>               85138   275

Step:  AIC= 211.49 
 time ~ dist 

        Df Sum of Sq     RSS     AIC
+ climb  1    6249.7  6891.9   190.9
<none>               13141.6   211.5

Step:  AIC= 190.9 
 time ~ dist + climb 


Call:
lm(formula = time ~ dist + climb, data = hills)

Coefficients:
(Intercept)         dist        climb  
   -8.99204      6.21796      0.01105  

If this still fails in 1.0.1 for you, please submit a bug report with a
reproducible example.


-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272860 (secr)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._



More information about the R-help mailing list