[R] Parameterisation of interaction terms in lm

John Fox jfox at mcmaster.ca
Thu Oct 3 02:30:36 CEST 2002

Dear Luke,

At 03:06 PM 10/2/2002 +0100, Luke Whitaker wrote:

>I have a 2 factor linear model, in which the only terms I am interested in 
>estimating and
>testing are the interaction terms. I want to control for the main effects 
>but have no interest
>in estimating or testing them. However, I would like an estimate of the 
>interaction effects
>for every level of the interactions, whereas what I get is one fewer 
>estimate than this, with the
>first level apparently used as a baseline.
>For example, suppose factor A has 2 levels, and factor B has 4 levels, I 
>would like 4 estimates,
>one for each level of B, showing how much different the actual A*B effect 
>is from what it would
>be if there were no interaction. I suspect it is necessary to assume the 
>mean interaction is zero in
>order to be estimable.
>Although the experiment was designed to be balanced, there are some 
>missing data, but no empty
>cells. The experiment is actually a gene expression micro array, where A 
>is tissue type and B represents
>a number of genes of interest.
>I have searched the archives, and read the docs relating to contrasts, but 
>only succeeded in getting
>confused. I would be very grateful if someone could point me to the solution.

Estimating interactions depends upon how the factors are resolved into 
contrasts. For example, for the default "treatment" contrasts 
(contr.treatment), the coefficients for the omitted levels are implicitly 
set to 0.

 From your description, it sounds as if you want "sum-to-zero" contrasts 
(contr.sum), obtained in R in several ways -- for example, by resetting the 
contrasts option for unordered factors, or by specifying the contrasts 
argument to lm.

Using contr.sum, you'll still get one fewer coefficient than level for each 
factor, but you can fill in the missing coefficients for the interactions 
by taking the negatives of the sums of the estimates in each row and column 
(since the estimates are constrained to sum to 0 over each coordinate).

Whether this is really of more interest than simply examining the cell 
means is another question.

I hope that this helps,

John Fox
Department of Sociology
McMaster University
Hamilton, Ontario, Canada L8S 4M4
email: jfox at mcmaster.ca
phone: 905-525-9140x23604
web: www.socsci.mcmaster.ca/jfox

r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch

More information about the R-help mailing list