[R] nested mixed-effect model: variance components

Spencer Graves spencer.graves at pdf.com
Sat Jun 10 20:24:10 CEST 2006

<see inline>  	

Eric Pante wrote:
 > Hi Spencer,
 > First, thank you very much for taking the time to write this detailed
 > reply !
 > I did try exactly the formula you suggested:
 >       fit <- lme(fixed=COVER ~ HABITAT, random = ~1|LAGOON/HABITAT,
 > data=cov)
 > before writing my post, and indeed, neither summary() or anova()
 > returned the sums of squares to assess variance components.

	  Do you want sums of squares or estimates of variance components?  I 
believe that for over half a century, there is been a substantial 
consensus among leading experts in statistical methods that maximum 
likelihood (or Bayesian posterior where an adequate prior exists) is 
theoretically the best method for most statistical questions.   With 
mixed effects, this is interpreted as requiring the maximization of the 
marginal likelihood (often after integrating out fixed effects to obtain 
the "restricted likelihood" maximized with method = "REML").

	  However, until the last couple of decades, ML or REML was not 
computationally feasible for many cases lacking balance.  This led to an 
extensive literature on alternative methods like MINQUE.  These methods 
are now substantially obsolete as far as I know except in cases with 
appropriate balance where they produce exactly the same answers as ML or 

	  If you want estimates of the variance components, then use 'lme': 
With appropriately balanced data sets, these would be exactly what you 
would get by writing out the expected mean squares and solving for the 
variance components.  Where the balance is lacking, the other methods 
will generally produce less efficient estimates of what you really want. 
  If you want sums of squares for testing, do your testing as described 
in Pinheiro and Bates.  Except in the saturated case I mentioned, you 
should get identical or superior estimates from lme and lmer than from 

	  If you want sums of squares for something other than an intermediate 
step for estimation or testing, please explain.

	  Be careful what you ask for, because you might get it -- and it might 
not be what you want.

 > how do you specify in the formula that you want a nested approach (I
 > will check the Pinheiro and Bates book) ? For these two reasons, I had
 > the feeling that aov might be the way to go ...

	  I understood you to say that you thought habitat should be nested 
within lagoon, but you also want to know "how much variation is due to 
lagoons? habitats? lagoons*habitat?".  That sounds to me like you want 
to know how to clean the spark plugs on your bicycle.

	  Have you studied the "Oats" example in my post?  (See also 

	  Bon Chance!
	  J’espère que ceci vous aide.
	  Spencer Graves
 > I will try your suggestions concerning the lme4 package.
 > Thanks again !
 > eric pante
 > Eric Pante

Spencer Graves wrote:
>       I have seen no reply to this, so I will offer a couple of comments 
> in spite of the fact that I know very little about "aov" other than it 
> is old and has largely been superceded by "lme" in the "nlme" package. 
> I've replied to many posts on random and mixed effects over the past few 
> years, and I only remember one case where 'aov' returned an answer that 
> could not have been obtained easily from 'lme'.  That one application 
> involved estimating a saturated model from perfectly balanced 
> experimental data.  'lme' refused to return an answer, at least as I 
> specified the model, and 'aov' returned numbers from which anyone 
> familiar with the theory for that special case could compute the desired 
> F ratios and p-values.  In all other cases that I remember, 'aov' would 
> either return the same answer, or much more commonly, a substantively 
> inferior answer -- if it were feasible to use 'aov' at all.
>       I mention this, because the simplest way I can think of to check 
> your answer is to suggest you try to work it using 'lme'.  To learn how 
> to do this, I strongly encourage you to consult Pinheiro and Bates 
> (2000) Mixed-Effects Models in S and S-PLUS (Springer).  Bates has been 
> for many years one of the leading contributors in this area and is the 
> primary author of the 'nlme', 'lme4' and 'Matrix" packages to support 
> working with these kinds of models.  This book discusses both how to fit 
> these kinds of models as well as how to produce plots of various kinds 
> that can be very helpful for explaining the experimental results to 
> others as well as diagnosing potential problem.  Moreover, the R scripts 
> discussed in the book are available in files called "ch01.R", "ch02.R", 
> ..., "ch06.R", "ch08.R" in a subfolder '~library\nlme\scripts' of the 
> directory in which R is installed on your hard drive.  This makes it 
> much easier to work through the examples in the book one line at a time, 
> experimenting at with modifications.  In addition, there is one point I 
> know where the R syntax differs from that in the book:  S-Plus will 
> accept x^2 in a formula as meaning the square of a numeric variable;  R 
> will not.  To specify a square in R, you need something like I(x^2). 
> When I copied the commands out of the book, I had serious trouble 
> getting answers like those in the book until I identified and corrected 
> this difference in syntax.  If you use the script files, they provide 
> the R syntax.
>       I'm not certain, but I believe the following should estimate the 
> model you wrote below:
>       fit <- lme(fixed=COVER ~ HABITAT,
>             random = ~1|LAGOON/HABITAT,
>             data=cov).
>       For comparison, I refer you to Pinheiro and Bates, p. 47, where I 
> find the following:
> fm1Oats <- lme( yield ~ ordered(nitro) * Variety, data = Oats,
>   random = ~ 1 | Block/Variety )
>       There are 3 Varieties and 6 Blocks in this Oats data.frame.  The 
> fixed effect of Variety has therefore 2 degrees of freedom.  However, 
> the random effect of Variety occurs in 18 levels, being all the 6*3 
> Block:Variety combinations.  You can see this by examining 'str(fm1Oats)'.
>       If you want to know "how much variation is due to lagoons? 
> habitats? lagoons*habitat? transects?", this model will NOT tell you 
> that, and I don't know how to answer that question using 'lme'.  I was 
> able to answer a question like that using 'lmer' associated with the 
> 'lme4' and 'Matrix' packages.  Unfortunately, these packages have some 
> names conflicts with 'nlme', and the simplest way I know to change from 
> one to the other is to "q()" and restart R  Before I did, however, I 
> made a local copy of the "Oats" data.frame.  After I did that, I seemed 
> to get sensible numbers from the following:
> library(lme4)
> fm1Oats4 <- lmer(yield~ ordered(nitro) * Variety
>         +(1|Block)+(1|Variety)+(1|Block:Variety),
>                  data=Oats)
>       For both "lme" and "lmer", the default "method" is "REML" 
> (restricted maximum likelihood).  This is what you want for estimation.  
> For testing random effects, you still want "REML", but you should adjust 
> the degrees of freedom of the reference "F" distribution as discussed in 
> section 2.4 of Pinheiro and Bates;  this also applies to confidence 
> intervals for the random effects.  For testing fixed effects, you should 
> use "method = 'ML'".
>       "lme4" is newer than "nlme" and does not currently have available 
> the complete set of helper functions for plotting, etc.  Thus, you will 
> likely want to use both.  For documentation on 'lmer', you should still 
> start with Pinheiro and Bates for the general theory, then refer to the 
> vignettes associated with the "mlmRev" and "lme4" packages;  if you 
> don't know about vignettes RSiteSearch("graves vignette") will lead you 
> to "http://finzi.psych.upenn.edu/R/Rhelp02a/archive/67006.html" and 
> other replies to r-help where I've described how to use them.  You will 
> also want the relevant article by Doug Bates in R News.  To find that, 
> go "www.r-project.org" -> Documentation:  Newsletter -> Download: "Table 
> of Contents", then search for Bates until you find "Douglas Bates. 
> Fitting linear mixed models in R. R News, 5(1):27-30, May 2005."
> That says you want vol. 5(1).
>       Hope this helps,
>       Spencer Graves
> Eric Pante wrote:
>> Dear listers,
>> I am trying to assess variance components for a nested, mixed-effects 
>> model. I think I got an answer that make sense from R, but I have a 
>> warning message and I wanted to check that what I am looking at is 
>> actually what I need:
>> my data are organized as transects within stations, stations within 
>> habitats, habitats within lagoons.
>> lagoons: random, habitats: fixed
>> the question is: how much variation is due to lagoons? habitats? 
>> lagoons*habitat? transects?
>> Here is my code:
>> data=cov)
>> summary(res)
>> and I get Sum Sq for each to calculate variance components:
>> Error: STRATE
>>         Df Sum Sq Mean Sq
>> STRATE  5 4493.1   898.6
>> Error: ATOLL
>>            Df Sum Sq Mean Sq F value Pr(>F)
>> Residuals  5 3340.5   668.1
>>            Df  Sum Sq Mean Sq F value Pr(>F)
>> Residuals 18 2442.71  135.71
>> Error: Within
>>             Df Sum Sq Mean Sq F value Pr(>F)
>> Residuals 145 6422.0    44.3
>> My error message seems to come from the LAGOON/HABITAT, the Error is 
>> computed.
>> Warning message: Error() model is singular in: aov(COVER ~ HABITAT + 
>> THANKS !!!
>> eric
>> Eric Pante
>> ----------------------------------------------------------------
>> College of Charleston, Grice Marine Laboratory
>> 205 Fort Johnson Road, Charleston SC 29412
>> Phone: 843-953-9190 (lab)  -9200 (main office)
>> ----------------------------------------------------------------
>>     "On ne force pas la curiosite, on l'eveille ..."
>>     Daniel Pennac
>> ______________________________________________
>> R-help at stat.math.ethz.ch mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide! 
>> http://www.R-project.org/posting-guide.html

More information about the R-help mailing list