pulling items out of a lm() call

Andrew Gelman gelman at stat.columbia.edu
Mon May 1 12:46:59 CEST 2006

I want to write a function to standardize regression predictors, which 
will require me to do some character-string manipulation to parse the 
variables in a call to lm() or glm().

For example, consider the call
lm (y ~ female + I(age^2) + female:black + (age + education)*female).

I want to be able to parse this to pick out the input variables 
("female", "age", "black", "education").  Then I can transform these as 
appropriate (to get "z.female", "z.age", etc), feed them back into the 
lm() function, and go from there.

Does anyone know an easy way to pull out the variables?  I basically 
have to parse out the symbols "+", ":", "*", and " ", but there's also 
the problem of handling parentheses and the I() operator.


