[R] How to deal with multicollinearity in mixed models (with lmer)?

Daniel Malter daniel at umd.edu
Sun Aug 16 19:46:00 CEST 2009

Hi, more generally you might be overfitting your model by interacting all of
the kidc polynomials with all of the year polynomials. Have a look at the
following example:


#kids and year are correlated

#simulate error term

#compute an arbitrary dependent variable

#true model

#dummies for year

#"your" model with all sorts of interactions

#assess variance inflation

Note first that in the first two models the correlation between kids and
year is basically not an issue even though the correlation is about 0.5.
However, note how you inflate the variance by including the interactions,
polynomials, and interacted polynomials between the correlated variables in
model reg3 (the first and third order polynomials and the second and fourth
order polynomials are, by necessety, always highly correlated). The
estimates in reg3 for the true effects are still pretty good though.
However, it may easily happen that you find some of the effects that are not
the "true" model significant due to overfitting and/or that you find true
effects insignificant due to variance inflation.

Thus, try a simpler model. Do you really need all the interactions and what
for? (If your previous post relates to the same data, collinearity should be
a minor issue, as the correlation is moderate at -0.25. The vif you computed
there also indicates that. But again, your creating and interacting all the
higher order polynomials makes things worse.

Further, is it reasonable to assume a "functional" relationship between
mortality and years? If not, you should fit year effects using dummy
variables or a random effect (the random effects model will only be unbiased
if the random effects are uncorrelated with the Xs, which is unlikely due to
the correlation of kidc and year). The nice thing about it is that the year
fixed effects model is unbiased in your case and spares you from including
polynomials for the year.


ps: If you want to model survival, you may want to think about using hazard
models instead.

cuncta stricte discussurus

-----Ursprüngliche Nachricht-----
Von: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] Im
Auftrag von willow1980
Gesendet: Sunday, August 16, 2009 11:27 AM
An: r-help at r-project.org
Betreff: [R] How to deal with multicollinearity in mixed models (with lmer)?

Dear R users,
I have a problem with multicollinearity in mixed models and I am using lmer
in package lme4. From previous mailing list, I learn of a reply
"http://www.mail-archive.com/r-help@stat.math.ethz.ch/msg38537.html" which
states that if not for interpretation but just for prediction,
multicollinearity does not matter much. However, I am using mixed model to
interpret something, so I am wondering if there is a suitable method to deal
with this problem in lmer.
My model is:
This is the maximum model and I have not begun to simplify it. The model is
used to interpret the pattern how a mother's cohort year and total number of
children will affect average survival rate of her children. Kids and byear_c
have been centered, so the problem of correlation between linear term and
polynomial terms (quadratic, cubic et al) has been solved to some degree. A
still serious problem with this model is that number of children is
correlated with cohort year, as we know the fact that number of children
declines with time.
So, would you please give a suggestion to deal with collinearity between
kids and byear?
Thank you very much for helping!
Best regards,
View this message in context:
Sent from the R help mailing list archive at Nabble.com.

R-help at r-project.org mailing list
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

More information about the R-help mailing list