[R] RE: step, leaps, lasso, LSE or what?

Fri Mar 1 19:57:38 CET 2002

Thanks for the most informative, and helpful feedback. 

Professor Ripley wrote:
(most of his message has been edited out)
>There are big differences between regression with only continuous variates,

>and regression involving hierarchies of factors. step/stepAIC include the 
>latter, the rest do not. 
In much of Venables and Ripley, bootstrapping keeps popping up. Is there a 
reason not to run step/stepAIC repeatedly on bootstrapped samples from the 
original data? On the face of it, bootstrapping seems intuitively appealing 
in this context. (Would some form of cross-validation on subsamples be 
better?)

>But generally model
>averaging (which you have not mentioned and is for regression a form of
>shrinkage) seems to have most support for prediction.

What do you mean by model averaging? It does not seem to match the
discussion 
of model selection that I found in Venables and Ripley (ie pages 186-188). 

>Lots of hyperbolic claims, no references.  But I suspect this is `ex-LSE'
>methodology, associated with Hendry's group (as PcGive and Ox are), and
>there is a link to Hendry (who is in Oxford).

Quite right. It is the Hendry group. As far as I can figure out, the main 
specific references are to: 
Hoover, K. D., and Perez, S. J. (1999). Data mining reconsidered:
Encompassing 
and the general-to specific approach to specification search. Econometrics 
Journal, 2, 167-191. 

Hoover, K. D., and Perez, S. J. (2001). Truth and robustness in
cross-country 
growth regressions. unpublished paper, Economics Department, University of 
California, Davis. 

>It has a different aim, I believe.  Certainly `effectiveness' has to be
>assessed relative to a clear aim, and simulation studies with true models
>don't seem to me to have the right aim.  

As suggested, the Hoover and Perez papers are basically simulation studies
where finding a true model was the aim. The working paper on growth
regressions
tries to go further, and seems to have reasonable sounding economic
conclusions.

>Statisticians of the Box/Cox/Tukey
>generation would say that effectiveness in deriving scientific insights
>was the real test (and I recall hearing that from those I named).

It is hard to argue with that claim. But it is equally hard to see it as 
complete. How do we define "scientific insight"? Or is it one of those cases
of: "I don't know how to define it, but I know it when I see it"?

Murray Z. Frank
B.I. Ghert Family Foundation Professor
Strategy & Business Economics
Faculty of Commerce
University of British Columbia
Vancouver, B.C.
Canada V6T 1Z2

phone: 604-822-8480
fax: 604-822-8477
e-mail: Murray.Frank at commerce.ubc.ca  

>  -----Original Message-----
> From: 	Frank, Murray  
> Sent:	Thursday, February 28, 2002 4:12 PM
> To:	
> Subject:	step, leaps, lasso, LSE or what?
> 
> Hi,
> 
> I am trying to understand the alternative methods that are available for
> selecting
> variables in a regression without simply imposing my own bias (having
> "good
> judgement"). The methods implimented in leaps and step and stepAIC seem to
> 
> fall into the general class of stepwise procedures. But these are commonly
> 
> condemmed for inducing overfitting.
> 
> In Hastie, Tibshirani and Friedman "The Elements of Statistical Learning"
> chapter 3, 
> they describe a number of procedures that seem better. The use of
> cross-validation 
> in the training stage presumably helps guard against overfitting. They
> seem 
> particularly favorable to shrinkage through ridge regressions, and to the
> "lasso". This
> may not be too surprising, given the authorship. Is the lasso "generally
> accepted" as 
> being a pretty good approach? Has it proved its worth on a variety of
> problems? Or is 
> it at the "interesting idea" stage? What, if anything, would be widely
> accepted as 
> being sensible -- apart from having "good judgement".
> 
> In econometrics there is a school (the "LSE methodology") which argues for
> what
> amounts to stepwise regressions combined with repeated tests of the
> properties of 
> the error terms. (It is actually a bit more complex than that.) This has
> been coded in 
> the program PCGets:
> (http://www.pcgive.com/pcgets/index.html?content=/pcgets/main.html) 
> If anyone knows how this compares in terms of effectiveness to the methods
> discussed in 
> Hastie et al., I would really be very interested. 
> 
> Cheers,
> Murray
> 
> Murray Z. Frank
> B.I. Ghert Family Foundation Professor
> Strategy & Business Economics
> Faculty of Commerce
> University of British Columbia
> Vancouver, B.C.
> Canada V6T 1Z2
> 
> phone: 604-822-8480
> fax: 604-822-8477
> e-mail: Murray.Frank at commerce.ubc.ca  
> 
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._