[R] RE: step, leaps, lasso, LSE or what?
Frank, Murray
murray.frank at commerce.ubc.ca
Fri Mar 1 19:57:38 CET 2002
Thanks for the most informative, and helpful feedback.
Professor Ripley wrote:
(most of his message has been edited out)
>There are big differences between regression with only continuous variates,
>and regression involving hierarchies of factors. step/stepAIC include the
>latter, the rest do not.
In much of Venables and Ripley, bootstrapping keeps popping up. Is there a
reason not to run step/stepAIC repeatedly on bootstrapped samples from the
original data? On the face of it, bootstrapping seems intuitively appealing
in this context. (Would some form of cross-validation on subsamples be
better?)
>But generally model
>averaging (which you have not mentioned and is for regression a form of
>shrinkage) seems to have most support for prediction.
What do you mean by model averaging? It does not seem to match the
discussion
of model selection that I found in Venables and Ripley (ie pages 186-188).
>Lots of hyperbolic claims, no references. But I suspect this is `ex-LSE'
>methodology, associated with Hendry's group (as PcGive and Ox are), and
>there is a link to Hendry (who is in Oxford).
Quite right. It is the Hendry group. As far as I can figure out, the main
specific references are to:
Hoover, K. D., and Perez, S. J. (1999). Data mining reconsidered:
Encompassing
and the general-to specific approach to specification search. Econometrics
Journal, 2, 167-191.
Hoover, K. D., and Perez, S. J. (2001). Truth and robustness in
cross-country
growth regressions. unpublished paper, Economics Department, University of
California, Davis.
>It has a different aim, I believe. Certainly `effectiveness' has to be
>assessed relative to a clear aim, and simulation studies with true models
>don't seem to me to have the right aim.
As suggested, the Hoover and Perez papers are basically simulation studies
where finding a true model was the aim. The working paper on growth
regressions
tries to go further, and seems to have reasonable sounding economic
conclusions.
>Statisticians of the Box/Cox/Tukey
>generation would say that effectiveness in deriving scientific insights
>was the real test (and I recall hearing that from those I named).
It is hard to argue with that claim. But it is equally hard to see it as
complete. How do we define "scientific insight"? Or is it one of those cases
of: "I don't know how to define it, but I know it when I see it"?
Murray Z. Frank
B.I. Ghert Family Foundation Professor
Strategy & Business Economics
Faculty of Commerce
University of British Columbia
Vancouver, B.C.
Canada V6T 1Z2
phone: 604-822-8480
fax: 604-822-8477
e-mail: Murray.Frank at commerce.ubc.ca
> -----Original Message-----
> From: Frank, Murray
> Sent: Thursday, February 28, 2002 4:12 PM
> To:
> Subject: step, leaps, lasso, LSE or what?
>
> Hi,
>
> I am trying to understand the alternative methods that are available for
> selecting
> variables in a regression without simply imposing my own bias (having
> "good
> judgement"). The methods implimented in leaps and step and stepAIC seem to
>
> fall into the general class of stepwise procedures. But these are commonly
>
> condemmed for inducing overfitting.
>
> In Hastie, Tibshirani and Friedman "The Elements of Statistical Learning"
> chapter 3,
> they describe a number of procedures that seem better. The use of
> cross-validation
> in the training stage presumably helps guard against overfitting. They
> seem
> particularly favorable to shrinkage through ridge regressions, and to the
> "lasso". This
> may not be too surprising, given the authorship. Is the lasso "generally
> accepted" as
> being a pretty good approach? Has it proved its worth on a variety of
> problems? Or is
> it at the "interesting idea" stage? What, if anything, would be widely
> accepted as
> being sensible -- apart from having "good judgement".
>
> In econometrics there is a school (the "LSE methodology") which argues for
> what
> amounts to stepwise regressions combined with repeated tests of the
> properties of
> the error terms. (It is actually a bit more complex than that.) This has
> been coded in
> the program PCGets:
> (http://www.pcgive.com/pcgets/index.html?content=/pcgets/main.html)
> If anyone knows how this compares in terms of effectiveness to the methods
> discussed in
> Hastie et al., I would really be very interested.
>
> Cheers,
> Murray
>
> Murray Z. Frank
> B.I. Ghert Family Foundation Professor
> Strategy & Business Economics
> Faculty of Commerce
> University of British Columbia
> Vancouver, B.C.
> Canada V6T 1Z2
>
> phone: 604-822-8480
> fax: 604-822-8477
> e-mail: Murray.Frank at commerce.ubc.ca
>
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
More information about the R-help
mailing list