[R-sig-ME] Differences in standard and mixed effects logistic regression - rounding error or conceptual mistake?
mio nome
n.pubblic at gmail.com
Wed Oct 26 18:16:48 CEST 2016
I'm a bit confused. To my understanding, the standard logistic regression
should be equivalent to a mixed effect logistic regression where the
statistical unit is defined as random effect - but I found the results of
the two analysis to be different.
As example, let's say that I have an experiment with 20 participants. To
each participant I ask x questions, and their answer is scored either as
either "right" or "wrong".
I simulated the dataset with:
*data0 = cbind(data.frame(matrix(sample(1:10, 40, replace=TRUE), nrow = 20,
byrow = TRUE)), rep(c('a', 'b'), each=10))names(data0) = c('s','f','c')*
Let's say that the column *s* has the number of right answers, the column
*f* has the number of wrong answer, and the column *c* represents a
between-subject experimental condition. The subjects are organized per rows.
To test the effect of *c* I can write:
*anova(glm ( cbind( s, f) ~ 1, data = data0, family = binomial), glm (
cbind( s, f) ~ 1 + c, data = data0, family = binomial), test="Chisq")*
which gives me:
*Model 1: cbind(s, f) ~ 1Model 2: cbind(s, f) ~ 1 + c Resid. Df Resid. Dev
Df Deviance Pr(>Chi)1 19 49.1442 18 42.978 1
6.1659 0.01302 **
So I tried to model the same data with a logistic mixed effect model.
First I converted it to a long format with:
*data1=NULLfor (i in 1 : nrow( data0) ) {tt = c(rep(1, data0$s[i]), rep(0,
data0$f[i]))t = cbind(tt, rep(i, length(tt)), rep(data0$c[i],
length(tt)))data1 = rbind( data1, t)}data1 = data.frame(data1)names(data1)
= c("r","s", "c")data1$s = factor(data1$s)data1$c = factor(data1$c)*
Where *s* is the subject ID. Then I tested the effect of *c* with:
*library(lme4)anova(glmer( r ~ 1 + ( 1| s), data = data1, family =
binomial), glmer( r ~ 1 + c + ( 1| s), data = data1, family = binomial),
test="Chisq")*
Now the same analysis gives me a different result!
*glmer(r ~ 1 + (1 | s), data = data1, family = binomial): r ~ 1 + (1 |
s)glmer(r ~ 1 + c + (1 | s), data = data1, family = binomial): r ~ 1 + c +
(1 | s) Df
AIC BICglmer(r ~ 1 + (1 | s), data = data1, family = binomial) 2
290.72 297.45glmer(r ~ 1 + c + (1 | s), data = data1, family = binomial) 3
289.97 300.07
logLik devianceglmer(r ~ 1 + (1 | s), data = data1, family = binomial)
-143.36 286.72glmer(r ~ 1 + c + (1 | s), data = data1, family = binomial)
-141.98 283.97
Chisq Chi Dfglmer(r ~ 1 + (1 | s), data = data1, family = binomial)glmer(r
~ 1 + c + (1 | s), data = data1, family = binomial) 2.7548 1
Pr(>Chisq)glmer(r ~ 1 +
(1 | s), data = data1, family = binomial)glmer(r ~ 1 + c + (1 | s), data =
data1, family = binomial) 0.09696 .*
Is it because of rounding errors in the implementation of the mathematical
functions, or there is some fundamental concept I'm missing here?
PS: the results of the two analysis can be closer that in the case I
reported - I cherry picked an extreme example for the sake of a clear
explanation.
[[alternative HTML version deleted]]
More information about the R-sig-mixed-models
mailing list