[R] car::Anova - Can it be used for ANCOVA with repeated-measures factors.

peter dalgaard pdalgd at gmail.com
Mon Jul 23 14:24:55 CEST 2012

On Jul 23, 2012, at 02:48 , John Fox wrote:

[snip long discussion which I admit not to have studied in every detail...]

>> Unfortunately, my involvement with this issue has led me to another question. Winer and Kirk both discuss a split-plot ANCOVA in which one has measured a covariate for each observation. That is a second matrix alike the original data matrix, e.g. the body temperature of each person at each measurement for the OBrienKaiser dataset:
>> OBK.cov <- OBrienKaiser
>> OBK.cov[,-(1:2)] <- runif(16*15, 36, 41)
>> Would it be possible to fit the data using this temperature matrix as a covariate using car::Anova (I thought about this but couldn't find any idea of how to specify the imatrix)?
> I'm afraid that Anova() won't handle repeated measures on covariates. I agree that it would be desirable to do so, and this capability is on my list of features to add to Anova(), but I can't promise when, or if, I'll get to it.

"Here There Be Tygers"... These models very easily get into territory that does not fall within the realm of standard (multivariate) linear modeling, and I'm not sure you really want it to be handled by a tool like Anova().

There is some risk that I will find myself writing half a treatise in email, but lets look at a simple example: a simple randomized block design with treatments (say, Variety) and a covariate (say, Eelworm). In much of the ANCOVA ideology there is an assumption that the covariate is independent of treatment, typically a pre-randomization measurement. Now, using standard univariate theory, you can easily fit a model like 

Yield ~ Variety + Eelworm + Block

in which there is a single regression coefficient on Eelworm, and the Variety effects are said to be "adjusted for differences in eelworm count". 

You can do this with lm(), or with aov() as you please. However, in the latter case, you might formulate the model with a random Block effect, i.e.

Yield ~ Variety + Eelworm + Error(Block)

In that case, you will find that you get two estimates of the Eelworm effect, one from each stratum. This comes about via interblock information: If there's a high average Yield in blocks where the average Eelworm is low, then this says something about the effect of Eelworm. The estimate from the within-Block stratum will be the same as in the model with non-random Block effects. 

If you believe in a mechanistic explanation for the Eelworm effect, you would likely believe that the two regression coefficients estimate the same quantity and you could try combining the estimates into one (recovery of interblock information). However, this messes up all standard theory and since the interblock estimate is usually quite inaccurate, one often decides to discard it. (Mixed-effects software happily fits such models, at the expense of precise "degrees of freedom"-theory.)

There's an alternative interpretation in the form of a two-dimensional model, 

cbind(Yield, Eelworm) ~ Variety + Error(Block)

In that model, you get two-dimensional contrasts, and covariance matrices for each stratum. Then you can utilize the fact that if it is known that the contrasts for the covariate are zero, then the mean of the response (i.e. Yield) is the same as the conditional mean given the covariate equals zero, which is the intercept in the conditional regression model, which is the adjusted ANCOVA contrast yet again.

One difference is that in the two-dimensional response model, it is not obvious that the "between" and "within" covariance matrices need to share a common regression coefficient. If you think about this with a view to potential measurement errors in the covariate, it becomes clear that the two regression coefficients could well be different.

In the repeated measurements setting, we make a shift from an additive Block effect to a multidimensional Yield-response (corresponding to a reshape from long to wide). Let us say, for convenience that there are 3 Varieties; then we are looking at a 3-dimensional response. If we want to study Variety effects, we can decompose the response into a set of contrasts and an average, discard the latter, and use multivariate tests for zero mean of the contrasts. 

To introduce a covariate at this point gets tricky because the standard linear model assumes the same design matrix for all responses, so you cannot have Yield1 depend on Eelworm1 only, Yield2 on Eelworm2, etc. although you could potentially have all responses depend on all covariates, leaving you with 9 regression coefficients. However, it is not at all clear that you can compare the intercepts between varieties in such a model. 

One viewpoint is that this is really a (2x3=6)-dimensional response problem if we consider Yield and Eelworm simultaneously. However, it is possible is to transform both the Yield and the Eelworm variables to contrasts, for a 4-dimensional response, consisting of two Yield contrasts and two Eelworm contrasts. If the latter are known to have mean zero, we can condition on them and look at the intercepts. That'll be a 2-d regression analysis with 2 covariates (4 regression coeficients) and I think the results should make OK sense. The annoying thing is that in the general case of p varieties, you get (p-1)^2 regression coefficients, but I suspect that it is not really possible to impose simplifying restrictions on them without losing simplicity of analysis.

Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com

More information about the R-help mailing list