[R] repost: problems with lm for nested fixed-factor Anova (ANOVA I)
Richard M. Heiberger
rmh at temple.edu
Thu Feb 12 18:18:53 CET 2009
tmp <- data.frame(y=rnorm(15000),
x1 <- factor(sample(48, 15000, replace=TRUE)),
z1 <- factor(sample(242, 15000, replace=TRUE)))
system.time(
tmp.aov <- aov(y ~ x1/z1, data=tmp)
)
## exceeds memory
tmp2 <- data.frame(y=rnorm(15000),
x1 <- factor(sample(48, 15000, replace=TRUE)),
z1 <- factor(sample(5, 15000, replace=TRUE)))
system.time(
tmp2.aov <- aov(y ~ x1/z1, data=tmp2)
)
anova(tmp2.aov)
## about 5 seconds
Use data.frames. They make it easier to read.
Use aov() instead of lm(). It is the same arithmetic,
but the unneeded columns of X are handled more gracefully.
My guess is that your data has 100s of distinct values for z1.
Therefore excess space was allocated. It is easier to understand with
distinct values of z1, but as you see it is costly in computer
resources.
You can force the actual numerical values of the second term to be
distinct across levels of x1 with the interaction() function. Then
use the simpler model and let the linear dependencies work in your
favor.
system.time(
tmp.aov <- aov(y ~ x1 + interaction(x1, z1), data=tmp)
)
anova(tmp.aov)
## about 6 seconds
Rich
More information about the R-help
mailing list