[R] building a formula for glm() with 30,000 independent variables

Ben Liblit liblit at eecs.berkeley.edu
Mon Nov 11 08:47:18 CET 2002

John Aitchison <jaitchis at hwy.com.au> wrote:

> I am interested in the current thinking on this issue (of the 
> acceptability of methodological questions)

I had the same concern, which is why my first posting tried to avoid 
giving many details about the analysis itself and focused instead on R 
usage questions.  But I guess the prospect of 30K variables got folks' 
attention, mostly of the "why in the world are you doing that" variety. 
  My methodology was so clearly absurd that many list members couldn't 
resist getting involved.  :-)

> Why you would want to fit an "additive" model in this context is 
> beyond me .. do you believe that x variable contributes something and 
> z variable something else and that if you add those effects together 
> you are likely to get a better prediction (of a crash)?.

Your question is well put.  Certainly my system is more of a 
single-unknown-cause scenario, rather than multiple-contributing-causes. 
  The *more* bad things you do, the more *likely* you are to crash, but 
ultimately it's only one of those bad things that kills you in any given 

I have to plead ignorance here.  I set out after a logistic regression 
because that's what the statisticians I consulted told me to use.  I 
wish I knew enough to be a more critical consumer of such advice.  But 
I'm a compiler guy, not a statistician, and as such I may be too easily 
led astray in this domain.

> So why not "screen" your predictors for "significance" in the first
> instance?  [...]  Use a simple t test or some such and throw out all 
> those that appear to have no influence

Others on this list have offered similar advice, and I am presently 
working on implementing precisely this approach.  Thank you for echoing 
this advice, and even more so for posing questions to challenge my 
assumptions and thereby educate me in the best Socratic tradition.


r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch

More information about the R-help mailing list