[R] inconsistent behaviour of add1 and drop1 with a weighted linear model
Jenny Hodgson
jh69 at york.ac.uk
Tue Mar 13 16:43:26 CET 2007
Dear R Help,
I have noticed some inconsistent behaviour of add1 and drop1 with a
weighted linear model, which affects the interpretation of the results.
I have these data to fit with a linear model, I want to weight them by
the relative size of the geographical areas they represent.
_________________________________________________________________________________________
> example
y x1 x2 weights
1 -4.546477 0.1859556 50.00000 0.9466022
2 1.484246 0.4740497 29.88000 1.3252430
3 2.203681 0.8594264 16.93333 0.9466022
4 1.943163 0.8713360 42.11765 2.7766997
5 1.886473 0.9006082 19.00000 0.9466022
6 1.795393 0.8455183 23.68421 1.1674760
7 1.878077 0.5641396 35.00000 0.8203885
8 -4.215484 0.4356858 58.75000 0.4417477
9 1.993339 0.5440061 19.28571 0.8519420
10 1.560869 0.6285066 19.54545 0.8203885
11 2.761535 0.7017720 15.83333 0.1262136
12 0.995959 0.4638751 0.00000 0.9466022
13 -4.516514 0.2794199 77.85714 0.8834954
> sum(example$weights)
[1] 13.00000
> model<-lm(y~1,data=example,weights=weights)
> add1(model,.~.+x1+x2)
Single term additions
Model:
y ~ 1
Df Sum of Sq RSS AIC
<none> 94.000 27.719
x1 1 55.290 38.710 18.185
x2 1 58.630 35.371 17.012
> addterm(model,.~.+x1+x2)
Single term additions
Model:
y ~ 1
Df Sum of Sq RSS AIC
<none> 94.000 27.719
x1 1 55.290 38.710 18.185
x2 1 58.630 35.371 17.012
#so add1 and addterm (MASS package) give the same answer, nothing
obviously untoward, but
#both SS and RSS change when you do...
> model<-lm(y~x1,data=example,weights=weights)
> drop1(model)
Single term deletions
Model:
y ~ x1
Df Sum of Sq RSS AIC
<none> 30.164 14.942############I would expect RSS to be 38.710
x1 1 44.377 74.541 24.703############I would expect SS to be
55.290 and RSS 94.000 #{3}#
> model<-lm(y~x2,data=example,weights=weights)
> drop1(model)
Single term deletions
Model:
y ~ x2
Df Sum of Sq RSS AIC
<none> 37.922 17.918
x2 1 36.619 74.541 24.703 #{1}#
#not only are SS and RSS different, the relative importance of x1 and x2
has swapped! I have checked that this
#does not happen if weights are not used (everything adds up as
expected, but the null RSS and other SSs are not the same as any quoted
here).
#The inconsistency is still there if you are not adding to a null model:
#NB I realise that x1 and x2 are correlated so whichever is last to be
added appears to have a lower SS - this is not the issue
> add1(model,.~.+x1+x2)#This or...
Single term additions
Model:
y ~ x2
Df Sum of Sq RSS AIC
<none> 35.371 17.012
x1 1 18.576 16.794 9.329
> addterm(model,.~x1+x2)# ...this is inconsistent with:
Single term additions
Model:
y ~ x2
Df Sum of Sq RSS AIC
<none> 35.371 17.012
x1 1 18.576 16.794 9.329
> drop1(update(model,.~.+x1))#this or...
Single term deletions
Model:
y ~ x2 + x1
Df Sum of Sq RSS AIC
<none> 14.456 7.380
x2 1 15.708 30.164 14.942 #{4}#
x1 1 23.466 37.922 17.918 #{2}#
> dropterm(update(model,.~.+x1))#...this
Single term deletions
Model:
y ~ x2 + x1
Df Sum of Sq RSS AIC
<none> 14.456 7.380
x2 1 15.708 30.164 14.942
x1 1 23.466 37.922 17.918
#and the same thing happens with the x2 variable - compare below with above
> add1(update(model,.~x1),.~x1+x2)
Single term additions
Model:
y ~ x1
Df Sum of Sq RSS AIC
<none> 38.710 18.185
x2 1 21.916 16.794 9.329
> addterm(update(model,.~x1),.~x1+x2)
Single term additions
Model:
y ~ x1
Df Sum of Sq RSS AIC
<none> 38.710 18.185
x2 1 21.916 16.794 9.329
#Why do I think add1/addterm are the problem rather than drop1/dropterm?
#Because drop1/dropterm agree with the anova:
> model<-lm(y~x2+x1,data=example,weights=weights)
> anova(model)
Analysis of Variance Table
Response: y
Df Sum Sq Mean Sq F value Pr(>F)
x2 1 36.619 36.619 25.331 0.0005119 *** ####this agrees with
#{1}# above
x1 1 23.466 23.466 16.233 0.0024035 ** ####this agrees with
#{2}# above
Residuals 10 14.456 1.446
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> model<-lm(y~x1+x2,data=example,weights=weights)
> anova(model)
Analysis of Variance Table
Response: y
Df Sum Sq Mean Sq F value Pr(>F)
x1 1 44.377 44.377 30.698 0.0002474 *** ####this agrees with
#{3}# above
x2 1 15.708 15.708 10.866 0.0080633 ** ####this agrees with
#{4}# above
Residuals 10 14.456 1.446
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
___________________________________________________________________________________________
My question is, why does this inconsistency happen? And is it safe to
assume that anova and drop1/dropterm are giving me the answers I want,
therefore to explore my model with these functions and avoid using
add1/addterm?
HUGE thanks for reading to the end! I would be extremely grateful if
someone could help me with this problem. I wasn't able to find any clues
in the docs or the r-help archives (but perhaps as it's a complex
problem I wasn't using the right search terms, if so, apologies)
Best wishes
Jenny
_____________
Jenny Hodgson
Department of Biology - area 18
PO box 373
University of York
YO10 5YW
UK
Tel: 01904 328623
More information about the R-help
mailing list