[R] Likelihood ratio test between glm and glmer fits
Rune Haubo
rhbc at imm.dtu.dk
Thu Jul 17 09:50:14 CEST 2008
2008/7/16 Dimitris Rizopoulos <Dimitris.Rizopoulos at med.kuleuven.be>:
> well, for computing the p-value you need to use pchisq() and dchisq() (check
> ?dchisq for more info). For model fits with a logLik method you can directly
> use the following simple function:
>
> lrt <- function (obj1, obj2) {
> L0 <- logLik(obj1)
> L1 <- logLik(obj2)
> L01 <- as.vector(- 2 * (L0 - L1))
> df <- attr(L1, "df") - attr(L0, "df")
> list(L01 = L01, df = df,
> "p-value" = pchisq(L01, df, lower.tail = FALSE))
> }
>
> library(lme4)
> gm0 <- glm(cbind(incidence, size - incidence) ~ period,
> family = binomial, data = cbpp)
> gm1 <- glmer(cbind(incidence, size - incidence) ~ period + (1 | herd),
> family = binomial, data = cbpp)
>
> lrt(gm0, gm1)
Yes, that seems quite natural, but then try to compare with the deviance:
logLik(gm0)
logLik(gm1)
(d0 <- deviance(gm0))
(d1 <- deviance(gm1))
(LR <- d0 - d1)
pchisq(LR, 1, lower = FALSE)
Obviously the deviance in glm is *not* twice the negative
log-likelihood as it is in glmer. The question remains which of these
two quantities is appropriate for comparison. I am not sure exactly
how the deviance and/or log-likelihood are calculated in glmer, but my
feeling is that one should trust the deviance rather than the
log-likelihoods for these purposes. This is supported by the following
comparison: Ad an arbitrary random effect with a close-to-zero
variance and note the deviance:
tmp <- rep(1:4, each = nrow(cbpp)/4)
gm2 <- glmer(cbind(incidence, size - incidence) ~ period + (1 | tmp),
family = binomial, data = cbpp)
(d2 <- deviance(gm2))
This deviance is very close to that obtained from the glm model.
I have included the mixed-models mailing list in the hope that someone
could explain how the deviance is computed in glmer and why deviances,
but not likelihoods are comparable to glm-fits.
Best
Rune
>
>
> However, there are some issues regarding this likelihood ratio test.
>
> 1) The null hypothesis is on the boundary of the parameter space, i.e., you
> test whether the variance for the random effect is zero. For this case the
> assumed chi-squared distribution for the LRT may *not* be totally
> appropriate and may produce conservative p-values. There is some theory
> regarding this issue, which has shown that the reference distribution for
> the LRT in this case is a mixture of a chi-squared(df = 0) and
> chi-squared(df = 1). Another option is to use simulation-based approach
> where you can approximate the reference distribution of the LRT under the
> null using simulation. You may check below for an illustration of this
> procedure (not-tested):
>
> X <- model.matrix(gm0)
> coefs <- coef(gm0)
> pr <- plogis(c(X %*% coefs))
> n <- length(pr)
> new.dat <- cbpp
> Tobs <- lrt(gm0, gm1)$L01
> B <- 200
> out.T <- numeric(B)
> for (b in 1:B) {
> y <- rbinom(n, cbpp$size, pr)
> new.dat$incidence <- y
> fit0 <- glm(formula(gm0), family = binomial, data = new.dat)
> fit1 <- glmer(formula(gm1), family = binomial, data = new.dat)
> out.T[b] <- lrt(fit0, fit1)$L01
> }
> # estimate p-value
> (sum(out.T >= Tobs) + 1) / (B + 1)
>
>
> 2) For the glmer fit you have to note that you work with an approximation to
> the log-likelihood (obtained using numerical integration) and not the actual
> log-likelihood.
>
> I hope it helps.
>
> Best,
> Dimitris
>
> ----
> Dimitris Rizopoulos
> Biostatistical Centre
> School of Public Health
> Catholic University of Leuven
>
> Address: Kapucijnenvoer 35, Leuven, Belgium
> Tel: +32/(0)16/336899
> Fax: +32/(0)16/337015
> Web: http://med.kuleuven.be/biostat/
> http://www.student.kuleuven.be/~m0390867/dimitris.htm
>
>
> Quoting COREY SPARKS <corey.sparks at UTSA.EDU>:
>
>> Dear list,
>> I am fitting a logistic multi-level regression model and need to test the
>> difference between the ordinary logistic regression from a glm() fit and
>> the mixed effects fit from glmer(), basically I want to do a likelihood
>> ratio test between the two fits.
>>
>>
>> The data are like this:
>> My outcome is a (1,0) for health status, I have several (1,0) dummy
>> variables RURAL, SMOKE, DRINK, EMPLOYED, highereduc, INDIG, male,
>> divorced, SINGLE, chronic, vigor_d and moderat_d and AGE is continuous (20
>> to 100).
>> My higher level is called munid and has 581 levels.
>> The data have 45243 observations.
>>
>> Here are my program statements:
>>
>> #GLM fit
>>
>> ph.fit.2<-glm(poorhealth~RURAL+SMOKE+DRINK+EMPLOYED+highereduc+INDIG+AGE+male+divorced+SINGLE+chronic+vigor_d+moderat_d,family=binomial(),
>> data=mx.merge)
>> #GLMER fit
>>
>> ph.fit.3<-glmer(poorhealth~RURAL+SMOKE+DRINK+EMPLOYED+INSURANCE+highereduc+INDIG+AGE+male+divorced+SINGLE+chronic+vigor_d+moderat_d+(1|munid),family=binomial(),
>> data=mx.merge)
>>
>> I cannot find a method in R that will do the LR test between a glm and a
>> glmer fit, so I try to do it using the liklihoods from both models
>>
>> #form the likelihood ratio test between the glm and glmer fits
>> x2<--2*(logLik(ph.fit.2)-logLik(ph.fit.3))
>>
>>> ML
>>
>> 79.60454
>> attr(,"nobs")
>> n
>> 45243
>> attr(,"nall")
>> n
>> 45243
>> attr(,"df")
>> [1] 14
>> attr(,"REML")
>> [1] FALSE
>> attr(,"class")
>> [1] "logLik"
>>
>> #Get the associated p-value
>> dchisq(x2,14)
>> ML
>>>
>>> 5.94849e-15
>>
>> Which looks like an improvement in model fit to me. Am I seeing this
>> correctly or are the two models even able to be compared? they are both
>> estimated via maximum likelihood, so they should be, I think.
>> Any help would be appreciated.
>>
>> Corey
>>
>> Corey S. Sparks, Ph.D.
>>
>> Assistant Professor
>> Department of Demography and Organization Studies
>> University of Texas San Antonio
>> One UTSA Circle
>> San Antonio, TX 78249
>> email:corey.sparks at utsa.edu
>> web: https://rowdyspace.utsa.edu/users/ozd504/www/index.htm
>>
>>
>> [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>
>
>
> Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Rune Haubo Bojesen Christensen
Master Student, M.Sc. Eng.
Phone: (+45) 30 26 45 54
Mail: rhbc at imm.dtu.dk, rune.haubo at gmail.com
DTU Informatics, Section for Statistics
Technical University of Denmark, Build.321, DK-2800 Kgs. Lyngby, Denmark
More information about the R-help
mailing list