[R] About the interaction A:B
Jeff Laake
Jeff.Laake at noaa.gov
Fri Mar 5 18:41:55 CET 2010
On 3/5/2010 9:19 AM, Frank E Harrell Jr wrote:
> You neglected to state your name and affiliation, and your question
> demonstrates an allergy to R documentation.
>
> Frank
I agree with Frank but will try to answer some of your questions as I
understand it.
First, model.matrix uses the options$contrasts or what is set
specifically as the contrast for a factor using contrasts(). The
default is
treatment contrasts and for a factor that means the first level of a
factor variable is the intercept and the remainder are "treatment" effects
which are added to intercept. If you specify Y~A+B+A:B, you are
specifying the model with main effects (A, B) and interactions (A:B).
If A has m levels and B has n levels then there will be an intercept
(1), m-1 for A, n-1 for B and (m-1)*(n-1) for the interaction.
If you specify the model as Y~A:B it will specify the model fully with
interactions which will have m*n separate parameters and none will be
NA as long as you have data in each of the m*n cells. It makes
absolutely no sense to me to have an intercept and then all but one of the
remaining interactions included.
Note that you'll get quite different results if A and/or B are not
factor variables.
--jeff
>
> blue sky wrote:
>> The following is the code for the model.matrix. But it still doesn't
>> answer why A:B is interpreted differently in Y~A+B+A:B and Y~A:B. By
>> 'why', I mean how R internally does it and what is the rational behind
>> the way of doing it?
>>
>> And it didn't answer why in the model.matrix of Y~A, there are a-1
>> terms from A plus the intercept, but in the model.matrix of Y~A:B,
>> there are a*b terms (rather than a*b-1 terms) plus the intercept.
>> Since the one coefficient of the lm of Y~A:B is going to be NA, why
>> bother to include the corresponding term in the model matrix?
>>
>> ############code below
>>
>> set.seed(0)
>>
>> a=3
>> b=4
>>
>> AB_effect=data.frame(
>> name=paste(
>> unlist(
>> do.call(
>> rbind
>> , rep(list(paste('A', letters[1:a],sep='')), b)
>> )
>> )
>> , unlist(
>> do.call(
>> cbind
>> , rep(list(paste('B', letters[1:b],sep='')), a)
>> )
>> )
>> , sep=':'
>> )
>> , value=rnorm(a*b)
>> , stringsAsFactors=F
>> )
>>
>> max_n=10
>> n=sample.int(max_n, a*b, replace=T)
>>
>> AB=mapply(function(name, n){rep(name,n)}, AB_effect$name, n)
>>
>> Y=AB_effect$value[match(unlist(AB), AB_effect$name)]
>>
>> Y=Y+a*b*rnorm(length(Y))
>>
>> sub_fr=as.data.frame(do.call(rbind, strsplit(unlist(AB), ':')))
>> rownames(sub_fr)=NULL
>> colnames(sub_fr)=c('A', 'B')
>>
>> fr=data.frame(Y=Y,sub_fr)
>>
>> my_subset=function(amm) {
>> coding=apply(
>> amm
>> ,1
>> , function(x) {
>> paste(x, collapse='')
>> }
>> )
>> amm[match(unique(coding), coding),]
>> }
>>
>> my_subset(model.matrix(Y ~ A*B,fr))
>> my_subset(model.matrix(Y ~ (A+B)^2,fr))
>> my_subset(model.matrix(Y ~ A + B + A:B,fr))
>> my_subset(model.matrix(Y ~ A:B - 1,fr))
>> my_subset(model.matrix(Y ~ A:B,fr))
>>
>> On Fri, Mar 5, 2010 at 8:45 AM, Gabor Grothendieck
>> <ggrothendieck at gmail.com> wrote:
>>> The way to understand this is to look at the output of model.matrix:
>>>
>>> model.matrix(fo, fr)
>>>
>>> for each formula you tried. If your data is large you will have to
>>> use a subset not to be overwhelmed with output.
>>>
>>> On Fri, Mar 5, 2010 at 9:08 AM, blue sky <bluesky315 at gmail.com> wrote:
>>>> Suppose, 'fr' is data.frame with columns 'Y', 'A' and 'B'. 'A' has
>>>> levels 'Aa'
>>>> 'Ab' and 'Ac', and 'B' has levels 'Ba', 'Bb', 'Bc' and 'Bd'. 'Y'
>>>> columns are numbers.
>>>>
>>>> I tried the following three sets of commands. I understand that A*B is
>>>> equivalent to A+B+A:B. However, A:B in A+B+A:B is different from A:B
>>>> just by itself (see the 3rd and 4th set of commands). Would you please
>>>> help me understand why the meanings of A:B are different in different
>>>> contexts?
>>>>
>>>> I also see the coefficient of AAc:BBd is NA (the last set of
>>>> commands). I'm wondering why this coefficient is not removed from the
>>>> 'coefficients' vector. Since lm(Y~A) has coefficients for (intercept),
>>>> Ab, Ac (there are no NA's), I think that it is reasonable to make sure
>>>> that there are no NA's when there are interaction terms, namely, A:B
>>>> in this case.
>>>>
>>>> Thank you for answering my questions!
>>>>
>>>>> alm=lm(Y ~ A*B,fr)
>>>>> alm$coefficients
>>>> (Intercept) AAb AAc BBb BBc
>>>> BBd
>>>> -3.548176 -2.086586 7.003857 4.367800 11.887356 -3.470840
>>>> AAb:BBb AAc:BBb AAb:BBc AAc:BBc AAb:BBd AAc:BBd
>>>> 5.160865 -11.858425 -12.853116 -20.289611 6.727401 -2.327173
>>>>> alm=lm(Y ~ A + B + A:B,fr)
>>>>> alm$coefficients
>>>> (Intercept) AAb AAc BBb BBc
>>>> BBd
>>>> -3.548176 -2.086586 7.003857 4.367800 11.887356 -3.470840
>>>> AAb:BBb AAc:BBb AAb:BBc AAc:BBc AAb:BBd AAc:BBd
>>>> 5.160865 -11.858425 -12.853116 -20.289611 6.727401 -2.327173
>>>>> alm=lm(Y ~ A:B - 1,fr)
>>>>> alm$coefficients
>>>> AAa:BBa AAb:BBa AAc:BBa AAa:BBb AAb:BBb AAc:BBb
>>>> AAa:BBc
>>>> -3.5481765 -5.6347625 3.4556808 0.8196231 3.8939016 -4.0349449
>>>> 8.3391795
>>>> AAb:BBc AAc:BBc AAa:BBd AAb:BBd AAc:BBd
>>>> -6.6005222 -4.9465744 -7.0190168 -2.3782017 -2.3423322
>>>>> alm=lm(Y ~ A:B,fr)
>>>>> alm$coefficients
>>>> (Intercept) AAa:BBa AAb:BBa AAc:BBa AAa:BBb
>>>> AAb:BBb
>>>> -2.34233221 -1.20584424 -3.29243033 5.79801305 3.16195534
>>>> 6.23623377
>>>> AAc:BBb AAa:BBc AAb:BBc AAc:BBc AAa:BBd AAb:BBd
>>>> -1.69261273 10.68151168 -4.25819000 -2.60424217 -4.67668454
>>>> -0.03586951
>>>> AAc:BBd
>>>> NA
>>>>
>
More information about the R-help
mailing list