[R] [R-sig-ME] mixed model testing

Thu Nov 8 10:00:27 CET 2007

Dear John,

Forgive me for putting my nose out, I hope that I'm not rude, but I am a bit bewildered by your mail (and by statistical modelling).

I agree that if your model is:

Lawndepression~lawn.roller.weight +(1|lawn.id),

When, in fact, it *should be* (because you simulated the data or you're God):

Lawndepression~lawn.roller.weight +(lawn.roller.weight|lawn.id),

then you might erroneously fail to reject the null hypothesis that the random effect for slope is zero. But in real life one does not know the true model, and there are an infinite number of (functional forms for) random effects that such tests may fail upon. What should one do, why stop at the linear term? Why not saturate the model with random effects? (And still, you don't know whether you have the right model). 

In your example one would perhaps like to think that VERY light lawn movers did not cause any depression at all, and that there is a maximum depression that a lawn mover could cause, so there should at least be an f(lawn.roller.weight), or we do as Venables suggest in http://www.stats.ox.ac.uk/pub/MASS3/Exegeses.pdf : we centre our lawn mover weights and keep in the middle of this interval where things look linear and fit a linear, Gaussian model.

Then I don't quite get your point from Box' quote since that concerned heterogeneous variances, right? Here the situation is quite the opposite to rejecting the null, using a test for heterogeneous variances that is more sensitive than the ANOVA for departures from homogeneity is akin to sending out a rowing boat on a rough sea where the ocean liner (ANOVA) would safely fare. We failed to reject the null hypothesis of zero slope.

So what is your suggestion of a sensible modelling strategy? (Without being biased by what you see in your dataset; I used to like inspecting the data fitting lmList if possible, then fitting a rather complex model, and then removing insignificant terms, then checking assumptions. After having read Harrell's book (Regression modelling strategies) I'm a bit uncertain what to do when people ask me to analyze their data, since they don't like to think too much about it. Harrell's suggestion that one could check the literature for a sensible model seems pernicious to me since these old results are based on the very same modelling strategy that he rejects. Should one use the Bayesian framework with flat priors?)

Best regards,

Fredrik Nilsson

-----Ursprungligt meddelande-----
Från: r-sig-mixed-models-bounces at r-project.org [mailto:r-sig-mixed-models-bounces at r-project.org] För John Maindonald
Skickat: den 7 november 2007 22:47
Till: Irene Mantzouni
Kopia: r-sig-mixed-models at r-project.org; r-help at stat.math.ethz.ch
Ämne: Re: [R-sig-ME] mixed model testing

Whether or not you need a mixed model, e.g. random versus
fixed slopes, depends on how you intend to use results.

Suppose you have lines of depression vs lawn roller weight
calculated for a number of lawns. If the data will always be
used to make predictions for one of those same lawns, a
fixed slopes model is fine.

If you want to use the data to make a prediction for another
lawn from the same "population" (the population from which
this lawn is a random sample, right?), you need to model
the slope as a random effect.

Now for a more subtle point:

In the prediction for another lawn situation, it is possible that
the slope random effect can be zero, and analysts do very
commonly make this sort of assumption, maybe without
realizing that this is what they are doing.  You can test whether
the slope random effect is zero but, especially if you have data
from a few lawns only, failure to reject the null (zero random
effect) is not a secure basis for inferences that assume that
the slope is indeed zero. The "test for zero random effect, then
infer" is open to Box's pithy objection that
"... to make preliminary tests on variances is rather like putting to
sea in a rowing boat to find out whether conditions are sufficiently
calm for an ocean liner to leave port".

John Maindonald             email: john.maindonald at anu.edu.au
phone : +61 2 (6125)3473    fax  : +61 2(6125)5549
Centre for Mathematics & Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.

On 8 Nov 2007, at 1:55 AM, Irene Mantzouni wrote:

> Is there a formal way to prove the need of a mixed model, apart from  
> e.g. comparing the intervals estimated by lmList fit?
> For example, should I compare (with AIC ML?) a model with seperately  
> (unpooled) estimated fixed slopes (i.e.using an index for each  
> group) with a model that treats this parameter as a random effect  
> (both models treat the remaining parameters as random)?
>
> Thank you!
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models