[R] fixed effect significance with lmer() vs. t-test

Sat Jul 19 17:14:56 CEST 2008

I am looking at data of the following structure:

  n <- 100
  dataset <- data.frame(gender=NULL,subject=NULL,outcome=NULL)
  for (i in 1:n){
    gender <- c(rep("m",5),rep("f",5))
    subject <- letters[1:10]
    outcome <- c(rbinom(5,1,0.6),rbinom(5,1,0.4))
    dataset <- rbind(dataset,cbind(gender,subject,outcome))}

I am interested in the significance of the fixed effect, gender. So I 
compare:

  one <- lmer(outcome~(1|subject),dataset,binomial)
  two <- lmer(outcome~gender+(1|subject),dataset,binomial)
  anova(one,two)

I inspect the p-value given under anova(one,two). 

Note that usually lmer() -- correctly, since the only difference between 
subjects comes from the gender effect -- estimates zero variance for the 
random effect here. I am only asking about cases where this variance is 
zero!

To my way of thinking, the observations are grouped under ten subjects, 
five male and five female. So a reasonable p-value would come from a t-test 
of the two groups of five subject scores, viz.:

  scores <- xtabs(~outcome+subject,dataset)[2,]/n
  male.scores <- scores[1:5]
  female.scores <- scores[6:10]
  t.test(male.scores,female.scores)

When I run these two, I get results like the following:

lmer(): 1.950e-06
t-test: 1.688e-05

lmer(): 2.042e-07
t-test: 4.606e-05

lmer(): 0.0001934
t-test: 0.004178

lmer(): 0.0001447
t-test: 0.001961

lmer(): 9.168e-07
t-test: 7.807e-07

As we can see, the anova() p-value on the lmer() models is usually, but not 
always, anti-conservative with respect to the t-test, usually by between 1 
and 2 orders of magnitude.

Can someone please explain why I'm not getting closer agreement between 
these two numbers? It seems that both approaches are asking the same 
question - what is the significance of the gender effect in the data?

In both approaches, it's the only effect (since subject variance is zero) 
and both approaches take into account the non-independence/grouping 
structure of the data, but in different ways - the t-test by working with 
subject average scores, and the lmer() by...

Am I misunderstanding something here?

Thanks very much,
Daniel