[R] random effects in mixed model not that 'random'
Thomas Mang
thomasmang.ng at gmail.com
Sun Dec 13 17:20:22 CET 2009
HI,
Thanks for your input; see below
On 12/13/2009 4:41 PM, Robert A LaBudde wrote:
> I think what you are finding is that calling a grouping variable a
> "random effect" is not the same thing as it actually being a random effect.
>
> An effect is really only random when it is chosen randomly. Just because
> you don't want to deal with it as a fixed effect (e.g., too many levels)
> doesn't mean it qualifies as a random effect. This sloppiness in common
> in mixed modeling.
Well to some degree the species were chosen randomly, so there isn't a
big selection bias in there. I also argue they wouldn't qualify as fixed
effect (they might as stand-alone fixed effect factor, but definitely
not as interaction with other predictors - there is no reason to believe
the impact of predictors is totally independent across species).
Sample size isn't the problem; I truly wouldn't want to include them as
fixed effect based on expert knowledge.
>
> In your example of student scores, you mentioned the schools were a
> random effect, because they were a grouping variable. This is not true.
> Schools have a strong fixed effect. They are also not chosen randomly in
> your student.
>
> How to resolve your problem? Two methods: 1) Stop modeling the grouping
> variable as a random effect, when it's not: Model it as a fixed effect;
> 2) Do the experiment right: a) List the schools in their population. b)
> Chose the schools to be used by random sampling from that population.
> Then you will find schools really is a random effect.
1) does not seem to be the right solution.
2) is more interesting in terms of understanding:
Are you saying that it's just the random choice of why something was
included in the sample is what makes it qualify as random effect ? I
thought the fact that it is the realization of a random variable (drawn
from a N(0, sigma) distribution). These are two different things.
Suppose I list all the schools in the population and randomly pick 15.
IIUC, you would argue now it qualifies as random effect. However, once I
have chosen my schools I could still investigate the estimated random
effects coefficients, a posteriori investigate the schools and try to
find out what discriminates those with students above average from those
below average. Odds are, if I had the resources to make a thorough
investigation, I would find something - or in other words, because there
is something deterministic behind it, I would have said they are not the
random realization from a normal distribution - which was my
understanding of properties of random effects so far, but which might be
wrong and hence the problem (although due to the complexity of this
deterministic process, they might practically appear as random
realizations). If I would pick a 16. school and then apply my knowledge
from the investigations, I could probably say if it will be above or
below average - this is what, in my understanding of random effects,
actually would not qualify it as random effect, whereas according to you
it would, if the school was chosen randomly. Is that correct ?
Suppose I have chosen randomly: Does it make sense to investigate a
posteriori why the estimates for the random effects are the way the are
and find insights on the system, or would it not make sense as they are
assumed complete random realization of a random variable and can be
anything because they are random variable ?
To some degree I think the issue can also be seen the following way:
Conditional on my extensive knowledge of the school properties, the
schools are probably not distributed iid. I could have this knowledge
enter as fixed effect. But since this knowledge is usually not available
the unconditional distribution might well make them iid N(0, sigma), and
hence makes the schools qualify as grouping variable for random effects
(where of course it is assumed that now sampling was done randomly from
the population).
But what shall I do if I have a bit of the extensive knowledge available
-> maybe too much to sticking to the complete unconditional iid
assumption, but also not enough for a sensible conditional distribution
to allow the specification of a fixed effect ?
thanks
Thomas
>
> What you have discovered is called "selection bias". It is common in
> unrandomized studies.
>
>
> At 09:12 AM 12/13/2009, Thomas Mang wrote:
>> HI,
>>
>> Thanks for your response; yes you are right it's not fully on topic,
>> but I chose this list not only because I am using R for all my stats
>> and so read it anyway, but also because here many statisticians read too.
>> Do you know another list where my question is more appropriate ?
>> For what it's worth, haven't found a local statistician yet to really
>> answer the question, but I'll continue searching ...
>>
>> thanks,
>> Thomas
>>
>> On 12/13/2009 11:07 AM, Daniel Malter wrote:
>>> Hi, you are unlikely to (or lucky if you) get a response to your
>>> question
>>> from the list. This is a question that you should ask your local
>>> statistician with knowledge in stats and, optimally, your area of
>>> inquiry.
>>> The list is (mostly) concerned with solving R rather than statistical
>>> problems.
>>>
>>> Best of luck,
>>> Daniel
>>>
>>> -------------------------
>>> cuncta stricte discussurus
>>> -------------------------
>>> -----Original Message-----
>>> From: r-help-bounces at r-project.org
>>> [mailto:r-help-bounces at r-project.org] On
>>> Behalf Of Thomas Mang
>>> Sent: Friday, December 11, 2009 6:19 PM
>>> To: r-help at stat.math.ethz.ch
>>> Subject: [R] random effects in mixed model not that 'random'
>>>
>>> Hi,
>>>
>>> I have the following conceptual / interpretative question regarding
>>> random effects:
>>>
>>> A mixed effects model was fit on biological data, with observations
>>> coming from different species. There is a clear overall effect of
>>> certain predictors (entering the model as fixed effect), but as
>>> different species react slightly differently, the predictor also enters
>>> the model as random effect and with species as grouping variable. The
>>> resulting model is very fine.
>>>
>>> Now comes the tricky part however: I can inspect not only the variance
>>> parameter estimate for the random effect, but also the 'coefficients'
>>> for each species. If I do this, suppose I find out that they make
>>> biologically sense, and maybe actually more sense then they should:
>>> For each species vast biological knowledge is available, regarding
>>> traits etc. So I can link the random effect coefficients to that
>>> knowledge, see the deviation from the generic predictor impact (the
>>> fixed effect) and relate it to the traits of my species.
>>> However I see the following problem with that approach: If I have no
>>> knowledge of the species traits, or the species names are anonymous to
>>> me, it makes sense to treat the species-specific deviations as
>>> realizations of a random variable (principle of exchangeability). Once I
>>> know however the species used in the study and have the biological
>>> knowledge at hand, it does not make so much sense any more; I can
>>> predict whether for that particular species the generic predictor impact
>>> will be amplified, or not. That is, I can predict if more likely the
>>> draw from the assumed normal distribution of the random effects will be
>>> > 0, or< 0 - which is of course complete contradictory and nonsense if
>>> I assume I have a random draw from a N(0, sigma) distribution.
>>> Integrating the biological knowledge as fixed effect however might be
>>> tremendously difficult, as species traits can sometimes not readily be
>>> quantified in a numeric way.
>>> I could defer issue to the species traits and say, once the species
>>> evolved their traits were drawn randomly from a population. This however
>>> causes problems with ideas of evolution and phylogenetic relationships
>>> among the species.
>>>
>>> Maybe my question can be rephrased the following way:
>>> Does it ever make sense to _interpret_ the coefficients of the random
>>> effects for each group and link it to properties of the grouping
>>> variable? The assumption of a realization of a random variable seems to
>>> render that quite problematic. However, this means that the more
>>> ignorant I am , and the less knowledge I have, the more the random
>>> realization seems to become realistic - which is at odds with scientific
>>> investigations.
>>> Suppose the mixed model is one of the famous social sciences studies
>>> analysing pupil results on tests at different schools, with schools
>>> acting as grouping variable for a random effect intercept. If I have no
>>> knowledge about the schools, the random effect assumption makes sense.
>>> If I however investigate the schools in detail (either a priori or a
>>> posterior), say teaching quality of the teachers, socio-economic status
>>> of the school area etc, it will probably make sense to predict which
>>> ones will have pupils performing above average, and which below average.
>>> However then probably these factors leading me to the predictions should
>>> enter the model as fixed effects, and maybe I don't need and school
>>> random effect any more at all. But this means actually the school
>>> deviation from the global mean is not the realization of a random
>>> variable, but instead the result of something quite deterministic, but
>>> which is usually just unknown, or can only be measured with extreme,
>>> impractical efforts. So the process might not be random, just because
>>> so little is known about the process, the results appear as if they
>>> would be randomly drawn (from a larger population distribution). Again,
>>> is ignorance / lack of deeper knowledge the key to using random effects
>>> - and the more knowledge I have, the less ?
>>>
>>> many thanks,
>>> Thomas
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> ================================================================
> Robert A. LaBudde, PhD, PAS, Dpl. ACAFS e-mail: ral at lcfltd.com
> Least Cost Formulations, Ltd. URL: http://lcfltd.com/
> 824 Timberlake Drive Tel: 757-467-0954
> Virginia Beach, VA 23464-3239 Fax: 757-467-2947
>
> "Vere scire est per causas scire"
>
More information about the R-help
mailing list