[R] help on model selection - step()
Prof Brian Ripley
ripley at stats.ox.ac.uk
Mon Aug 11 09:18:23 CEST 2008
On Mon, 11 Aug 2008, Rodrigo Gazaffi wrote:
> dears R-users,
> I'm interested in model selection problem, and i have faced some problems
> that i would like to ask for help.
>
> well,
> this is a very small example with 4 variable (just one var. is the response
> - z) with 100 individuals
> i would like to do a stepwise search, for the "best" model, and a use BIC
> criteria.
>
> I know when I have a lot of variables, let's say 120, I know, it's not wise,
> consider the full model, so starting from "y~1", i can stop the search with
> the option steps.
> but when i have the IC with a negative value, is there any way that a can
> stop the search?
Not in the existing function. The absolute size of AIC (or BIC) has no
meaning for a linear model fit (since this is true of the log-likelihood
-- it depends on the scale of measurement).
But R is Open Source so you can modify step() in any way you like, even to
do nonsensical things.
> for example: form this data set
> the first step gives AIC=3.6, and the 2nd gives -9.03, IS THERE ANY WAY that
> a could say, "stop here, the previous one is the best for me"... like here,
> my model would be with no variable.
> I know that example, looks like silly but a have bigger data, that this
> happens in thirtieth iteration, what's why i would like some help
>
> i used the step(), is there other function that could stop this besides
> step()?
>
> cheers,
> Rodrigo Gazaffi
>
>
>
> x1 <- c( 0.3718, 0.3718, 0.3718, 0.3718, 0.3718, 0.3718, 0.3718,
> 0.3718, -1.0000, 0.3718, 0.3718, 0.3718, 0.3718, 0.3718, 0.3718,
> 0.3718, 0.3718, 0.3718, 0.3718, 0.3718, 0.3718, 0.3718, 0.0713,
> 0.1774, 0.3570, 0.3718, 0.3718, 0.3718, -1.0000, 0.3718, -1.0000,
> 0.1774, 0.3718, 0.3718, 0.0709, 0.1774, -1.0000, -1.0000, 0.3718,
> 0.3718, 0.0713, 0.0709, 0.3718, 0.3718, 0.3718, 0.3718, 0.2614,
> 0.2614, -0.9995, -1.0000, 0.1774, 0.3718, -1.0000, -1.0000, 0.1774,
> 0.3718, 0.1774, 0.3718, 0.3718, -1.0000, 0.3718, 0.3718, 0.3718,
> 0.3718, 0.3718, -1.0000, 0.3718, 0.3718, 0.3718, 0.3718, 0.0709,
> 0.0710, 0.3718, 0.3718, 0.3718, 0.3718, 0.3718, 0.0709, 0.3718,
> 0.0709, 0.0709, 0.3718, 0.0709, 0.3570, 0.3718, 0.3718, 0.3718,
> 0.0709, 0.3718, 0.3718, 0.3718, -1.0000, 0.3718, 0.3718, 0.3718,
> -1.0000, 0.3718, 0.3718, 0.3718, 0.3718)
>
> x2 <- c( 0.3898, -0.9995, 0.3898, 0.3898, 0.3898, 0.1978, 0.3898,
> -0.9997, -1.0000, -1.0000, 0.3898, 0.3898, 0.3898, 0.3898, -1.0000,
> 0.1978, -1.0000, 0.3898, 0.3898, -1.0000, 0.1978, 0.3898, 0.3898,
> 0.3898, 0.1978, -0.9995, 0.3792, -1.0000, -1.0000, 0.3898, 0.0837,
> 0.0837, 0.0837, 0.3898, 0.0837, 0.3898, 0.3898, 0.0837, 0.3898,
> 0.0837, 0.0837, -1.0000, -1.0000, 0.3898, 0.0841, 0.1976, -1.0000,
> 0.2467, 0.1978, 0.3842, 0.3898, 0.3848, 0.2766, 0.3898, 0.3898,
> 0.3898, -1.0000, -0.9995, 0.3898, 0.3898, 0.0837, 0.3898, -1.0000,
> 0.1978, 0.3898, 0.2766, 0.3898, 0.3898, 0.3898, 0.2766, 0.3898,
> 0.3866, 0.1978, 0.3898, -1.0000, -1.0000, 0.3898, 0.3898, 0.3898,
> 0.3898, 0.3898, 0.1978, 0.0841, -1.0000, 0.0837, 0.3898, 0.3898,
> -1.0000, 0.3898, 0.3898, -1.0000, 0.3898, 0.3898, 0.0837, 0.3898,
> 0.3898, 0.1976, 0.3898, 0.3898, 0.3898)
>
> x3 <- c( 0.9999, 0.9999, 0.9999, 1.0000, -0.9999, 0.9999, -0.9999,
> 0.9999, -0.9999, -1.0000, -1.0000, -0.9999, -0.9980, -0.9999, -0.9999,
> -1.0000, -0.9999, -0.9999, -0.9999, 1.0000, -1.0000, 1.0000, -1.0000,
> -1.0000, -1.0000, -0.9980, 1.0000, -0.9999, -1.0000, -1.0000, -0.9999,
> -0.9999, 0.9999, 1.0000, -0.9999, -1.0000, 1.0000, 0.9999, 1.0000,
> -0.9999, 0.9999, -1.0000, -1.0000, -0.9999, 0.8356, 0.8356, -0.3241,
> 0.8356, 0.8353, 0.8356, 1.0000, -1.0000, -1.0000, -1.0000, -1.0000,
> -1.0000, -0.9999, 0.9999, 1.0000, -0.9980, 0.9999, 1.0000, -1.0000,
> 1.0000, -0.9999, 1.0000, 0.9999, -1.0000, 1.0000, -1.0000, 0.9999,
> 0.9999, -1.0000, -1.0000, 1.0000, -1.0000, -1.0000, 1.0000, 1.0000,
> 1.0000, -0.9999, 1.0000, -1.0000, 1.0000, -1.0000, 1.0000, -1.0000,
> 1.0000, 1.0000, 1.0000, -1.0000, -0.9999, -0.8547, -1.0000, -0.7851,
> 0.8356, -1.0000, -0.9999, -0.9999, 1.0000)
>
> z <- c( -0.006548414, -1.035584950, -0.006548414, 0.180549138,
> 0.741841793, 1.770878329, -0.848487398, -1.035584950, -2.251719037,
> 0.461195465, 2.051524656, 1.116036897, -0.193645966, 0.274097913,
> 0.180549138, 0.274097913, 0.274097913, 0.835390569, 0.928939345,
> -1.316231277, 0.087000362, 0.741841793, 1.116036897, 0.180549138,
> -0.193645966, 0.274097913, 0.274097913, 1.490232001, -1.222682502,
> 1.303134449, 0.367646689, -0.100097190, -0.006548414, -1.035584950,
> 1.490232001, 0.648293017, -2.064621485, -2.625914141, 1.022488121,
> -0.006548414, -1.222682502, -0.567841070, -0.942036174, 0.461195465,
> 1.770878329, 0.461195465, -1.503328829, -1.035584950, -0.848487398,
> -0.567841070, 1.396683225, 2.051524656, -0.942036174, -0.754938622,
> -1.596877605, 0.648293017, -0.287194742, -0.567841070, 0.461195465,
> -0.474292294, -0.100097190, 0.287194742, 0.554744241, -0.006548414,
> 1.209585673, -1.409780053, 0.928939345, 0.928939345, -0.006548414,
> 1.396683225, -0.380743518, 0.928939345, 1.490232001, 1.770878329,
> -1.129133726, -0.848487398, -0.380743518, 0.274097913, -1.409780053,
> -0.100097190, 0.367646689, -0.474292294, 0.554744241, -2.251719037,
> 0.087000362, -0.848487398, 0.741841793, -2.064621485, -0.006548414,
> 0.461195465, -0.100097190, -0.006548414, 0.648293017, -0.287194742,
> 0.928939345, -0.193645966, -0.474292294, -0.006548414, -1.035584950,
> 0.461195465)
>
> step(lm(z
> ~1),scope=list(lower=~1,upper=~x1+x2+x3),direction="both",k=log(length(z)))
> #########
> Start: AIC=3.6
> z ~ 1
>
> Df Sum of Sq RSS AIC
> + x1 1 15.671 83.329 -9.028
> + x2 1 12.390 86.610 -5.165
> + x3 1 7.403 91.597 0.433
> <none> 99.000 3.600
>
> Step: AIC=-9.03
> z ~ x1
>
> Df Sum of Sq RSS AIC
> + x2 1 13.675 69.654 -22.348
> + x3 1 7.078 76.251 -13.299
> <none> 83.329 -9.028
> - x1 1 15.671 99.000 3.600
>
> Step: AIC=-22.35
> z ~ x1 + x2
>
> Df Sum of Sq RSS AIC
> + x3 1 8.930 60.723 -31.463
> <none> 69.654 -22.348
> - x2 1 13.675 83.329 -9.028
> - x1 1 16.956 86.610 -5.165
>
> Step: AIC=-31.46
> z ~ x1 + x2 + x3
>
> Df Sum of Sq RSS AIC
> <none> 60.723 -31.463
> - x3 1 8.930 69.654 -22.348
> - x2 1 15.527 76.251 -13.299
> - x1 1 16.669 77.392 -11.813
>
> Call:
> lm(formula = z ~ x1 + x2 + x3)
>
> Coefficients:
> (Intercept) x1 x2 x3
> -0.2015 0.9000 0.7269 -0.3083
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-help
mailing list