[R] testing two-factor anova effects using model comparison approach with lm() and anova()

Greg Snow Greg.Snow at imail.org
Fri Feb 27 19:30:46 CET 2009


Notice the degrees of freedom as well in the different models.  

With factors A and B, the 2 models:

A + B + A:B 

And 

A + A:B

Are actually the same overall model, just different parameterizations (you can also see this by using x=TRUE in the call to lm and looking at the x matrix used).

Testing if the main effect A should be in the model given that the interaction is in the model does not make sense in most cases, therefore the notation gives a different parameterization rather than the generally uninteresting test. 

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow at imail.org
801.408.8111


> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> project.org] On Behalf Of Paul Gribble
> Sent: Friday, February 27, 2009 11:01 AM
> To: r-help at r-project.org
> Subject: [R] testing two-factor anova effects using model comparison
> approach with lm() and anova()
> 
> I wonder if someone could explain the behavior of the anova() and lm()
> functions in the following situation:
> 
> I have a standard 3x2 factorial design, factorA has 3 levels, factorB
> has 2
> levels, they are fully crossed. I have a dependent variable DV.
> 
> Of course I can do the following to get the usual anova table:
> 
> > anova(lm(DV~factorA+factorB+factorA:factorB))
> Analysis of Variance Table
> 
> Response: DV
>                 Df  Sum Sq Mean Sq F value   Pr(>F)
> factorA          2  7.4667  3.7333  4.9778 0.015546 *
> factorB          1  2.1333  2.1333  2.8444 0.104648
> factorA:factorB  2  9.8667  4.9333  6.5778 0.005275 **
> Residuals       24 18.0000  0.7500
> 
> This is perfectly satisfactory for my situation, but as a pedagogical
> exercise, I wanted to demonstrate the model comparison approach to
> analysis
> of variance by using anova() to compare a full model that contains all
> effects, to restricted models that contain all effects save for the
> effect
> of interest.
> 
> The test of the interaction effect seems to be as I expected:
> 
> > fullmodel<-lm(DV~factorA+factorB+factorA:factorB)
> > restmodel<-lm(DV~factorA+factorB)
> > anova(fullmodel,restmodel)
> Analysis of Variance Table
> 
> Model 1: DV ~ factorA + factorB + factorA:factorB
> Model 2: DV ~ factorA + factorB
>   Res.Df     RSS Df Sum of Sq      F   Pr(>F)
> 1     24 18.0000
> 2     26 27.8667 -2   -9.8667 6.5778 0.005275 **
> 
> As you can see the value of F (6.5778) is the same as in the anova
> table
> above. All is well.
> 
> However, if I try to test a main effect, e.g. factorA, by testing the
> full
> model against a restricted model that doesn't contain the main effect
> factorA, I get something strange:
> 
> > restmodel<-lm(DV~factorB+factorA:factorB)
> > anova(fullmodel,restmodel)
> Analysis of Variance Table
> 
> Model 1: DV ~ factorA + factorB + factorA:factorB
> Model 2: DV ~ factorB + factorA:factorB
>   Res.Df RSS Df Sum of Sq F Pr(>F)
> 1     24  18
> 2     24  18  0         0
> 
> upon inspection of each model I see that the Residuals are identical,
> which
> is not what I was expecting:
> 
> > anova(fullmodel)
> Analysis of Variance Table
> 
> Response: DV
>                 Df  Sum Sq Mean Sq F value   Pr(>F)
> factorA          2  7.4667  3.7333  4.9778 0.015546 *
> factorB          1  2.1333  2.1333  2.8444 0.104648
> factorA:factorB  2  9.8667  4.9333  6.5778 0.005275 **
> Residuals       24 18.0000  0.7500
> 
> This looks fine, but then the restricted model is where things are not
> as I
> expected:
> 
> > anova(restmodel)
> Analysis of Variance Table
> 
> Response: DV
>                 Df  Sum Sq Mean Sq F value   Pr(>F)
> factorB          1  2.1333  2.1333  2.8444 0.104648
> factorB:factorA  4 17.3333  4.3333  5.7778 0.002104 **
> Residuals       24 18.0000  0.7500
> 
> I was expecting the Residuals in the restricted model (the one not
> containing main effect of factorA) to be larger than in the full model
> containing all three effects. In other words, the variance accounted
> for by
> the main effect factorA should be added to the Residuals. Instead, it
> looks
> like the variance accounted for by the main effect of factorA is being
> soaked up by the factorA:factorB interaction term. Strangely, the
> degrees of
> freedom are also affected.
> 
> I must be misunderstanding something here. Can someone point out what
> is
> happening?
> 
> Thanks,
> 
> -Paul
> 
> --
> Paul L. Gribble, Ph.D.
> Associate Professor
> Dept. Psychology
> The University of Western Ontario
> London, Ontario
> Canada N6A 5C2
> Tel. +1 519 661 2111 x82237
> Fax. +1 519 661 3961
> pgribble at uwo.ca
> http://gribblelab.org
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list