[R] random effects in mixed model not that 'random'

Sun Dec 13 16:41:02 CET 2009

I think what you are finding is that calling a grouping variable a 
"random effect" is not the same thing as it actually being a random effect.

An effect is really only random when it is chosen randomly. Just 
because you don't want to deal with it as a fixed effect (e.g., too 
many levels) doesn't mean it qualifies as a random effect. This 
sloppiness in common in mixed modeling.

In your example of student scores, you mentioned the schools were a 
random effect, because they were a grouping variable. This is not 
true. Schools have a strong fixed effect. They are also not chosen 
randomly in your student.

How to resolve your problem? Two methods: 1) Stop modeling the 
grouping variable as a random effect, when it's not: Model it as a 
fixed effect; 2) Do the experiment right: a) List the schools in 
their population. b) Chose the schools to be used by random sampling 
from that population. Then you will find schools really is a random effect.

What you have discovered is called "selection bias". It is common in 
unrandomized studies.

At 09:12 AM 12/13/2009, Thomas Mang wrote:
>HI,
>
>Thanks for your response; yes you are right it's not fully on topic, 
>but I chose this list not only because I am using R for all my stats 
>and so read it anyway, but also because here many statisticians read too.
>Do you know another list where my question is more appropriate ?
>For what it's worth, haven't found a local statistician yet to 
>really answer the question, but I'll continue searching ...
>
>thanks,
>Thomas
>
>On 12/13/2009 11:07 AM, Daniel Malter wrote:
>>Hi, you are unlikely to (or lucky if you) get a response to your question
>>from the list. This is a question that you should ask your local
>>statistician with knowledge in stats and, optimally, your area of inquiry.
>>The list is (mostly) concerned with solving R rather than statistical
>>problems.
>>
>>Best of luck,
>>Daniel
>>
>>-------------------------
>>cuncta stricte discussurus
>>-------------------------
>>-----Original Message-----
>>From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On
>>Behalf Of Thomas Mang
>>Sent: Friday, December 11, 2009 6:19 PM
>>To: r-help at stat.math.ethz.ch
>>Subject: [R] random effects in mixed model not that 'random'
>>
>>Hi,
>>
>>I have the following conceptual / interpretative question regarding
>>random effects:
>>
>>A mixed effects model was fit on biological data, with observations
>>coming from different species. There is a clear overall effect of
>>certain predictors (entering the model as fixed effect), but as
>>different species react slightly differently, the predictor also enters
>>the model as random effect and with species as grouping variable. The
>>resulting model is very fine.
>>
>>Now comes the tricky part however: I can inspect not only the variance
>>parameter estimate for the random effect, but also the 'coefficients'
>>for each species. If I do this, suppose I find out that they make
>>biologically sense, and maybe actually more sense then they should:
>>For each species vast biological knowledge is available, regarding
>>traits etc. So I can link the random effect coefficients to that
>>knowledge, see the deviation from the generic predictor impact (the
>>fixed effect) and relate it to the traits of my species.
>>However I see the following problem with that approach: If I have no
>>knowledge of the species traits, or the species names are anonymous to
>>me, it makes sense to treat the species-specific deviations as
>>realizations of a random variable (principle of exchangeability). Once I
>>know however the species used in the study and have the biological
>>knowledge at hand, it does not make so much sense any more; I can
>>predict whether for that particular species the generic predictor impact
>>will be amplified, or not. That is, I can predict if more likely the
>>draw from the assumed normal distribution of the random effects will be
>>   >  0, or<  0 - which is of course complete contradictory and nonsense if
>>I assume I have a random draw from a N(0, sigma) distribution.
>>Integrating the biological knowledge as fixed effect however might be
>>tremendously difficult, as species traits can sometimes not readily be
>>quantified in a numeric way.
>>I could defer issue to the species traits and say, once the species
>>evolved their traits were drawn randomly from a population. This however
>>causes problems with ideas of evolution and phylogenetic relationships
>>among the species.
>>
>>Maybe my question can be rephrased the following way:
>>Does it ever make sense to _interpret_ the coefficients of the random
>>effects for each group and link it to properties of the grouping
>>variable? The assumption of a realization of a random variable seems to
>>render that quite problematic. However, this means that the more
>>ignorant I am , and the less knowledge I have, the more the random
>>realization seems to become realistic - which is at odds with scientific
>>investigations.
>>Suppose the mixed model is one of the famous social sciences studies
>>analysing pupil results on tests at different schools, with schools
>>acting as grouping variable for a random effect intercept. If I have no
>>knowledge about the schools, the random effect assumption makes sense.
>>If I however investigate the schools in detail (either a priori or a
>>posterior), say teaching quality of the teachers, socio-economic status
>>of the school area etc, it will probably make sense to predict which
>>ones will have pupils performing above average, and which below average.
>>However then probably these factors leading me to the predictions should
>>enter the model as fixed effects, and maybe I don't need and school
>>random effect any more at all. But this means actually the school
>>deviation from the global mean is not the realization of a random
>>variable, but instead the result of something quite deterministic, but
>>which is usually just unknown, or can only be measured with extreme,
>>impractical efforts.  So the process might not be random, just because
>>so little is known about the process, the results appear as if they
>>would be randomly drawn (from a larger population distribution). Again,
>>is ignorance / lack of deeper knowledge the key to using random effects
>>- and the more knowledge I have, the less ?
>>
>>many thanks,
>>Thomas
>>
>>______________________________________________
>>R-help at r-project.org mailing list
>>https://stat.ethz.ch/mailman/listinfo/r-help
>>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>and provide commented, minimal, self-contained, reproducible code.
>
>______________________________________________
>R-help at r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

================================================================
Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: ral at lcfltd.com
Least Cost Formulations, Ltd.            URL: http://lcfltd.com/
824 Timberlake Drive                     Tel: 757-467-0954
Virginia Beach, VA 23464-3239            Fax: 757-467-2947

"Vere scire est per causas scire"