[R] Trouble about the interpretation of intercept in lm models
    Stefano Leonardi 
    stefano.leonardi at unipr.it
       
    Tue Jan 13 16:32:13 CET 2009
    
    
  
Hallo,
yesterday I was puzzled when I discovered that I
probabliy miss something in the interepretation of intercept
in two-way lm models.
I thought that the intercept, using the default contr.treatment
contrasts,  represents the mean of the group of observations
having zero in all column of the model.matrix.
It turns out not to be case
To be more more clear I am attaching a short example:
R>#this is to generate the dataset
R>set.seed(1) #to have the same results I have
R>B <-rep(c("a","b","c"),c(6,6,6))
R>nB <- 1:3
R>names(nB)<-c("a","b","c")
R>B <- as.factor(B)
R>A<-rep(rep(c("0","1"),c(3,3)),3)
R>A <- as.factor(A)
R>Y <- 5+ 10*as.numeric(as.character(A)) + 20*nB[as.character(B)]
+  rnorm(18,0,5)
R>#this is the generated dataset
R>data.frame(Y,A,B)
           Y A B
1  21.86773 0 a
2  25.91822 0 a
3  20.82186 0 a
4  42.97640 1 a
5  36.64754 1 a
6  30.89766 1 a
7  47.43715 0 b
8  48.69162 0 b
9  47.87891 0 b
10 53.47306 1 b
11 62.55891 1 b
12 56.94922 1 b
13 61.89380 0 c
14 53.92650 0 c
15 70.62465 0 c
16 74.77533 1 c
17 74.91905 1 c
18 79.71918 1 c
R>#table with means
R>M<-tapply(Y,list(A,B),mean)
R>print(M)
          a        b        c
0 22.86927 48.00256 62.14832
1 36.84053 57.66039 76.47119
R>lm(Y ~ A + B)
Call:
lm(formula = Y ~ A + B)
Coefficients:
(Intercept)           A1           Bb           Bc
       23.53        12.65        22.98        39.45
I was expecting that the intercept in the lm output would be equal
to the top left corner (0-a) of my previous M table (22.86927)
which is the mean of the first three observations in the
dataset.
So how do I interpret the intercept 23.53?
I could not relate it to any other mean.
R>mean(Y[A=="0" & B=="a"])
[1] 22.86927
R>apply(M,1,mean)
        0        1
44.34005 56.99070
R>apply(M,2,mean)
        a        b        c
29.85490 52.83148 69.30975
The following of course gave me the same results as the lm call
R>X<- model.matrix(~A+B)
R>b <- solve(t(X) %*% X) %*% t(X) %*% Y
R>b
                 [,1]
(Intercept) 23.52957
A1          12.65066
Bb          22.97658
Bc          39.45485
Thank you in advance for any suggestion.
Stefano
-- 
======================================================================
  Stefano Leonardi
  Dipartimento di Scienze Ambientali
  Universita` di Parma              stefano.leonardi at unipr.it
  Viale Usberti 11a                             Phone : +39-0521-905659
  43100 PARMA  (Italy)                          Fax   : +39-0521-905402
    
    
More information about the R-help
mailing list