[R] ANCOVA post-hoc test

Mon Feb 13 04:52:16 CET 2012

On Feb 12, 2012, at 7:39 AM, Evagelopoulos Thanasis wrote:

> Could you please help me on the following ANCOVA issue?
>
> This is a part of my dataset:
>
> sampling dist         h
> 1        wi  200 0.8687212
> 2        wi  200 0.8812909
> 3        wi  200 0.8267464
> 4        wi    0 0.8554508
> 5        wi    0 0.9506721
> 6        wi    0 0.8112781
> 7        wi  400 0.8687212
> 8        wi  400 0.8414646
> 9        wi  400 0.7601675
> 10       wi  900 0.6577048
> 11       wi  900 0.6098403
> 12       wi  900 0.5574382
> 13       sp  200 0.9149264
> 14       sp  200 0.9149264
> 15       sp  200 0.9248187
> 16       sp    0 0.9974016
> 17       sp    0 0.9997114
> 18       sp    0 0.8812909
> ...
>
> h is the dependent variable, distance the covariate and sampling the  
> factor.
>
> the slopes for h~distance linear regressions are significantly  
> different from 0 for all samplings
>
> In order to compare the regression slopes for each sampling, i did  
> an ANCOVA with the ancova() function of the HH package:
>
>> mod<-ancova(h~sampling*dist,data)
>
> There was a significant interaction term:
>

Barely. A borderline interaction may be of trivial importance when  
interpreted in the context of a model which has much stronger "main  
effects". Just using the anova table will hide the magnitude of the  
interaction effects. Was this an effect that was predicted by theory  
or even one which has interpretation in an experimental setting that  
leads to actionable conclusions?

> Analysis of Variance Table
>
> Response: h
>              Df  Sum Sq Mean Sq F value    Pr(>F)
> sampling       3 0.22822 0.07607 13.7476 2.624e-06 ***
> dist           1 0.51291 0.51291 92.6908 5.703e-12 ***
> sampling:dist  3 0.05112 0.01704  3.0792   0.03822 *
> Residuals     40 0.22134 0.00553
>
> Because there exist significantly different regression slopes, I did  
> a post hoc test with glht() to find out between which samplings:
>
>> summary(glht(mod, linfct=mcp(sampling="Tukey")))
>
> The results seem to say that there are no significantly different  
> slopes for any of the pair-wise comparisons of factor levels:
>
>  Simultaneous Tests for General Linear Hypotheses
>
> Multiple Comparisons of Means: Tukey Contrasts
>
>
> Fit: aov(formula = h ~ sampling * dist, data = data)
>
> Linear Hypotheses:
>             Estimate Std. Error z value Pr(>|z|)
> sp - au == 0  0.06696    0.04562   1.468    0.457
> su - au == 0 -0.02238    0.04562  -0.491    0.961
> wi - au == 0  0.01203    0.04562   0.264    0.994
> su - sp == 0 -0.08934    0.04562  -1.958    0.204
> wi - sp == 0 -0.05493    0.04562  -1.204    0.624
> wi - su == 0  0.03441    0.04562   0.754    0.875
> (Adjusted p values reported -- single-step method)
>
> Warning message:
> In mcp2matrix(model, linfct = linfct) :
>  covariate interactions found -- default contrast might be  
> inappropriate
>
>
>
> My questions are:
>
> - Did I make a mistake somewhere? (I probably did!)

(I did a search for prior questions of this sort (post-hoc testing in  
models with interactions)  and found many fewer than I would have  
expected and many that I found had no answers offered, which I found  
surprising.)

You called the multiple comparisons function without specifying the  
"level" at which the multiple comparisons should be conducted and the  
function warned you that it might not be returning the results you  
expect under such conditions. I've looked at multcomp::glht's help  
page and it wasn't immediately evident to me how one would request  
comparisons just among the second-order interactions. It appears there  
are both matrix and "symbolic" options, but the exact syntax for  
specifying the symbolic options is not to my reading well described.  
The description in the help page texts seems at odds within the  
example on the same page.

A vignette is cited on the help page. Looking through the vignette,  
"Simultaneous Inference in General Parametric Models", I see that  
there is an example of the application of `glht` to an interaction  
model (section 6.3). It uses the matrix formulation but unfortunately  
does not include the code used to create the "K" matrix for that  
model. I tried using the code offered earlier in the vignette for that  
purpose but it doesn't agree with the offered K matrix in that  
section. So ...

K <- read.table( text="(Icpt) s<10 s10-20 s>20 gMale s<10:gMale  
s10-20:gMale s>20:gMale
None:Female 1 0 0 0 0 0 0 0
<10:Female 1 1 0 0 0 0 0 0
10-20:Female 1 0 1 0 0 0 0 0
 >20:Female 1 0 0 1 0 0 0 0
None:Male 1 0 0 0 1 0 0 0
<10:Male 1 1 0 0 1 1 0 0
10-20:Male 1 0 1 0 1 0 1 0
 >20:Male 1 0 0 1 1 0 0 1 ", header=TRUE, check.names=FALSE)

K <- as.matrix(K)

(And then I get the same output as in the vignette. Unfortunately the  
mechanical process of constructing that matrix did not leave me with  
the theoretical understanding of how it was created.

##-----------

Another option might be to simply use TukeyHSD specifying the second  
argument as "sampling:dist". When used with the example on TukeyHSD on  
this model I get what seem to me to be interpretable results:

fm2 <- aov(breaks ~ wool * tension, data = warpbreaks)
  TukeyHSD(fm2, "wool:tension")

It appears that all of the nominally significant comparisons involve  
the "A" wool type and "L" tension, so an industrially meaningful  
response would be to tighten up the loom tension only when using type  
"A". That was not a conclusion that would ahve been possible to arrive  
at using just a simple model of:

  breaks ~ wool + tension

I'm hoping that some of the real statisticians will comment on these  
ideas. I'm just a suburban physician with hopes of learning as I get  
older.

> - Could I do pairwise ANCOVAs and thus have just two factor levels  
> (=two regression slopes) to compare each time?
> What does the warning message "covariate interactions found --  
> default contrast might be inappropriate" mean?

>
> Thank you!
> Athanasios Evagelopoulos

David Winsemius, MD
West Hartford, CT