[R] slow computation of mixed ANOVA using aov
Peter Dalgaard
p.dalgaard at biostat.ku.dk
Sat Mar 19 19:20:49 CET 2005
Peter Dalgaard <p.dalgaard at biostat.ku.dk> writes:
> - (not too sure of this, but R 2.1.0 will have some new code for
> multivariate lm, with intra-individual designs, and
> tests under the sphericity assumptions; it is possible that
> your models can be reformulated in that framework. You'd have to
> set it up as a model with 160 responses on 56 subjects, though, and
> the code wasn't really designed with that sort of intrasubject
> dimensionality in mind.)
OK, I've tried this now and it does seem to work, and much faster than
the aov() approach. I had to fix a buglet in the code (eigenvalues
coming up with small imaginary parts due to numerics), so currently
available versions won't quite work, but it should be in the alpha
releases that are supposed to start Monday.
Here's how it works:
### orig setup, edited so as to actually work...
f1<-gl(40,1,8960,ordered=T)
f2<-gl(7,1280,8960,ordered=T)
f3<-gl(4,40,8960,ordered=T)
subject<-gl(56,160,8960,ordered=T)
dv<-rnorm(8960,mean=500,sd=50)
d <- data.frame(f1,f2,f3,subject,dv)
### Regroup to 56x160 matrix response
f2a <- f2[seq(1,8801,160)]
idata <- d[1:160,] # intrasubj. design, actually need only f1,f3 from this
Y <- matrix(dv,56,byrow=T)
### now fit models with effect of f2a, constant, and empty
fit <- lm(Y~f2a)
fit2 <- lm(Y~1)
fit3 <- lm(Y~0)
## The main trick is to do anova on within-subject effects. First look
## at the interactions, alias residuals from an additive model ~f1+f3.
## The reduction Model 1-> Model 2 corresponds to testing the f1:f2:f3
## interaction in aov (tests whether the f1:f3 interaction depends on
## f2a) and Model 2 -> Model 3 is the test for the f1:f3 interaction
## being zero.
## I'm not quite sure what to make of the correction terms when the
## estimated SSD matrix is singular (it has to be, the dimension is
## 117, but the df is only 49). Probably the G-G epsilon is bogus, but
## the H-F one seems to fare rather well.
> anova(fit,fit2,fit3,test="Spherical",X=~f1+f3,idata=idata)
Analysis of Variance Table
Model 1: Y ~ f2a
Model 2: Y ~ 1
Model 3: Y ~ 0
Contrasts orthogonal to
~f1 + f3
Greenhouse-Geisser epsilon: 0.2903
Huynh-Feldt epsilon: 0.9639
Res.Df Df Gen.var. F num Df den Df Pr(>F) G-G Pr H-F Pr
1 49 0.00000
2 55 6 0.00000 0.9036 702 5733 0.96034 0.82268 0.95748
3 56 1 0.00000 1.0460 117 5733 0.35003 0.39624 0.35206
## Next, we can look at the f1 effects. This is basically the
## difference between two projections, on ~f1+f3 and ~f3
## This gives us the tests for f2:f1 and f1
> anova(fit,fit2,fit3,test="Spherical",M=~f1+f3,X=~f3,idata=idata)
Analysis of Variance Table
Model 1: Y ~ f2a
Model 2: Y ~ 1
Model 3: Y ~ 0
Contrasts orthogonal to
~f3
Contrasts spanned by
~f1 + f3
Greenhouse-Geisser epsilon: 0.5598
Huynh-Feldt epsilon: 1.0284
Res.Df Df Gen.var. F num Df den Df Pr(>F) G-G Pr H-F Pr
1 49 315.74
2 55 6 344.98 0.9883 234 1911 0.54 0.52 0.54
3 56 1 347.15 0.8393 39 1911 0.75 0.68 0.75
## Same thing with f3
> anova(fit,fit2,fit3,test="Spherical",M=~f1+f3,X=~f1,idata=idata)
Analysis of Variance Table
Model 1: Y ~ f2a
Model 2: Y ~ 1
Model 3: Y ~ 0
Contrasts orthogonal to
~f1
Contrasts spanned by
~f1 + f3
Greenhouse-Geisser epsilon: 0.9482
Huynh-Feldt epsilon: 1.0128
Res.Df Df Gen.var. F num Df den Df Pr(>F) G-G Pr H-F Pr
1 49 35.171
2 55 6 34.679 0.9010 18 147 0.578 0.574 0.578
3 56 1 34.658 1.0039 3 147 0.393 0.390 0.393
## Actually, because of the complete design, we can ignore f1 and get
## the same analysis:
> anova(fit,fit2,fit3,test="Spherical",M=~f3,X=~1,idata=idata)
Analysis of Variance Table
Model 1: Y ~ f2a
Model 2: Y ~ 1
Model 3: Y ~ 0
Contrasts orthogonal to
~1
Contrasts spanned by
~f3
Greenhouse-Geisser epsilon: 0.9482
Huynh-Feldt epsilon: 1.0128
Res.Df Df Gen.var. F num Df den Df Pr(>F) G-G Pr H-F Pr
1 49 35.171
2 55 6 34.679 0.9010 18 147 0.578 0.574 0.578
3 56 1 34.658 1.0039 3 147 0.393 0.390 0.393
## Finally we get the main effect of f2a as follows. Notice that the
## Model 2 -> Model 3 reduction is the test for zero overall mean,
## which you might (and aov does) want to omit.
> anova(fit,fit2,fit3,test="Spherical",M=~1,X=~0,idata=idata)
Analysis of Variance Table
Model 1: Y ~ f2a
Model 2: Y ~ 1
Model 3: Y ~ 0
Contrasts orthogonal to
~0
Contrasts spanned by
~1
Greenhouse-Geisser epsilon: 1
Huynh-Feldt epsilon: 1
Res.Df Df Gen.var. F num Df den Df Pr(>F) G-G Pr
H-F Pr
1 49 11
2 55 6 12 1.5010e+00 6 49 1.976e-01 1.976e-01
1.976e-01
3 56 1 249956 1.2699e+06 1 49 8.346e-110 8.346e-110
8.346e-110
## Finally aov() results for comparison:
> system.time(aa <- aov(dv~f1*f2*f3+Error(subject/(f1+f3)),data=d))
[1] 526.50 9.27 561.29 0.00 0.00
> summary(aa)
Error: subject
Df Sum Sq Mean Sq F value Pr(>F)
f2 6 15883 2647 1.501 0.1976
Residuals 49 86411 1763
Error: subject:f1
Df Sum Sq Mean Sq F value Pr(>F)
f1 39 81906 2100 0.8393 0.7487
f1:f2 234 578666 2473 0.9883 0.5376
Residuals 1911 4781665 2502
Error: subject:f3
Df Sum Sq Mean Sq F value Pr(>F)
f3 3 6899 2300 1.0039 0.3930
f2:f3 18 37153 2064 0.9010 0.5782
Residuals 147 336732 2291
Error: Within
Df Sum Sq Mean Sq F value Pr(>F)
f1:f3 117 308658 2638 1.0460 0.3500
f1:f2:f3 702 1599743 2279 0.9036 0.9603
Residuals 5733 14458856 2522
>
--
O__ ---- Peter Dalgaard Blegdamsvej 3
c/ /'_ --- Dept. of Biostatistics 2200 Cph. N
(*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
More information about the R-help
mailing list