[R] lmer, p-values and all that

Wed Apr 3 15:56:50 CEST 2013

My apologies for coming late into this conversation, but I'm curious about something in your response

You use the following code to peform a likelihood ratio test between an lm object and a mer object

fm0 <- lm(distance~age,data=Orthodont)
fm2 <- lmer(distance~age+(1|Subject),data=Orthodont,REML=FALSE)

ddiff <- -2*c(logLik(fm0)-logLik(fm2))
pchisq(ddiff,1,lower.tail=FALSE)

It seems like this would be simple to roll up into a function, such as

lrtestGeneric <- function(fit1, fit2){
  chisq <- -2 * c(logLik(fit1) - logLik(fit2))
  df <- abs(attributes(logLik(fit1))$df - attributes(logLik(fit2))$df)
   p <- pchisq(chisq, df, lower.tail=FALSE)
   lrtest <- data.frame("L.R.Chisq"=chisq, "d.f."=df, "P"=p)
   return(lrtest)
}

lrtestGeneric(fm0, fm2)

My question is about whether it is appropriate to use the degrees of freedom returned by logLik or if I should just use 1 degree of freedom when comparing a model without the random effect to one with the random effect.  For instance, logLik returns a difference of 3 between degrees of freedom in the models.  Should I be using the 3 degrees of freedom in the likelihood ratio test, or is it better to go with 1? 

fit0 <- lm(Reaction ~ Days, sleepstudy)
fit1 <- lmer(Reaction ~ Days + (Days|Subject), sleepstudy)

lrtestGeneric(fit0, fit2)

Any education you can provide would be great appreciated.

Thanks

  Benjamin Nutter |  Biostatistician     |  Quantitative Health Sciences
  Cleveland Clinic    |  9500 Euclid Ave.  |  Cleveland, OH 44195  | (216) 445-1365

-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Ben Bolker
Sent: Wednesday, March 27, 2013 10:34 PM
To: David Winsemius
Cc: r-help at stat.math.ethz.ch
Subject: Re: [R] lmer, p-values and all that

On 13-03-27 10:10 PM, David Winsemius wrote:
> 
> On Mar 27, 2013, at 7:00 PM, Ben Bolker wrote:
> 
>> Michael Grant <michael.grant <at> colorado.edu> writes:
>>> Dear Help:
>>
>>> I am trying to follow Professor Bates' recommendation, quoted by 
>>> Professor Crawley in The R Book, p629, to determine whether I should 
>>> model data using the 'plain old' lm function or the mixed model 
>>> function lmer by using the syntax anova(lmModel,lmerModel).
>>> Apparently I've not understood the recommendation or the proper 
>>> likelihood ratio test in question (or both) for I get this error
>>> message: Error: $ operator not defined for this S4 class.
>>
>>  I don't have the R Book handy (some more context would be extremely 
>> useful!  I would think it would count as "fair use" to quote the 
>> passage you're referring to ...)
> 
> This is the quoted Rhelp entry:
> 
> http://tolstoy.newcastle.edu.au/R/help/05/01/10006.html
> 
> (I'm unable to determine whether it applies to the question at hand.)

  OK, I misspoke -- sorry.  I think the lmer()/lme() likelihoods are actually comparable; it's GLMMs (glmer(), with no analogue in lme()-land except for MASS::glmmPQL, which doesn't give you log-likelihoods at all) where the problem arises.

  You can (1) use lme(), (2)  look at http://glmm.wikidot.com/faq for suggestions about testing random-effects terms (including "perhaps don't do it at all"), or (3) construct the likelihood ratio test yourself as follows:

library("nlme")
data(Orthodont)
fm1 <- lme(distance~age,random=~1|Subject,data=Orthodont)
fm0 <- lm(distance~age,data=Orthodont)
anova(fm1,fm0)[["p-value"]]
detach("package:nlme",unload=TRUE)
library("lme4.0") ## stable version of lme4
fm2 <- lmer(distance~age+(1|Subject),data=Orthodont,REML=FALSE)
anova(fm2,fm0) ## fails
ddiff <- -2*c(logLik(fm0)-logLik(fm2))
pchisq(ddiff,1,lower.tail=FALSE)
## not identical to above but close enough

> 
>>
>>> Would someone be kind enough to point out my blunder?
>>
>>  You should probably repost this to the 
>> r-sig-mixed-models at r-project.org mailing list.
>>
>>  My short answer would be: (1) I don't think you can actually use 
>> anova() to compare likelihoods between lm() and lme()/lmer() fits in 
>> the way that you want: *maybe* for lme() [don't recall], but almost 
>> certainly not for lmer().  See http://glmm.wikidot.com/faq for 
>> methods for testing significance/inclusion of random factors (short 
>> answer: you should *generally* try to make the decision whether to 
>> include random factors or not on _a priori_ grounds, not on the basis 
>> of statistical tests ...)
>>
>>  Ben Bolker
>>
>

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

===================================

 Please consider the environment before printing this e-mail

Cleveland Clinic is ranked one of the top hospitals in America by U.S.News & World Report (2012).  
Visit us online at http://www.clevelandclinic.org for a complete listing of our services, staff and locations.

Confidentiality Note:  This message is intended for use ...{{dropped:18}}