[R] Questions regrading the lasso and glmnet
Patrick Breheny
patrick.breheny at uky.edu
Sun May 29 12:41:36 CEST 2011
On 05/28/2011 12:54 PM, Ben Haller wrote:
> 1. Is my choice of glmnet() ok? On what basis should I choose
> glmnet() vs. lars()?
LARS is for linear regression; your outcome is binary.
> 2. Is the way I'm scaling the variables before calling glmnet()
> correct? Or should the squares themselves be centered and scaled?
> 3. Is my model matrix correct, or do I have a problem with the scale
> of the interaction variables?
glmnet centers and scales the variables itself. You do not need to do so.
> 4. Is it a problem that the lasso fit gives non-zero coefficients for
> interactions whose underlying terms have zero coefficients?
This is going to occur with any automated model selection procedure
unless specifically disallowed.
> 5. Is there any way to choose a simple explanatory model, smaller
> than the best predictive model supported by the data, that is less
> arbitrary / subjective?
You have 5 variables. Variable selection is not your goal. What you
are trying to do is fit a curve (as opposed to a line) through your
data, along possibly with interactions. I would suggest looking into
splines, provided for example in the mgcv package.
--
Patrick Breheny
Assistant Professor
Department of Biostatistics
Department of Statistics
University of Kentucky
More information about the R-help
mailing list