[R] Analysis of Data with Observation Weights Revisited
Bliese, Paul D MAJ WRAIR-Wash DC
Paul.Bliese at NA.AMEDD.ARMY.MIL
Thu Nov 21 20:08:39 CET 2002
In lm and glm, the weights command should only be used when the variances of
the observations are inversely proportional to the weights.
Recently, however, a question came up regarding how one could estimate lm
and glm models with weights where the weights refer to number of
observations (counts).
Because lm and glm do not handle this case, SE values will be wrong if one
uses the weight command with counts. Notice that the SE values in the
second model are larger than those in the first model, though paramater
estimates are identical.
> x <- rep(c(1,2), c(6,4))
> y <- rep(c(1,2,3,4),c(2,3,3,2))
> cbind(x,y)
x y
[1,] 1 1
[2,] 1 1
[3,] 1 2
[4,] 1 2
[5,] 1 2
[6,] 1 3
[7,] 2 3
[8,] 2 3
[9,] 2 4
[10,] 2 4
> summary(m <- lm(y~x))$coef
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.1666667 0.6627489 0.2514778 0.807784044
x 1.6666667 0.4468252 3.7300192 0.005787586
>
> x1 <- rep(c(1,2), c(3,2))
> y1 <- rep(c(1,2,3,4), c(1,1,2,1))
> w <- c(2,3,1,2,2)
> cbind(x1,y1,w)
x1 y1 w
[1,] 1 1 2
[2,] 1 2 3
[3,] 1 3 1
[4,] 2 3 2
[5,] 2 4 2
>
> summary(m1 <- lm(y1~x1, weights=w))$coef
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.1666667 1.0822644 0.1539981 0.8873876
x1 1.6666667 0.7296625 2.2841610 0.1065266
The thread left off with Thomas Lumley recommending that one use gee to
estimate the SE values while sticking with the parameter estimates from the
weighted lm or glm model. Specifically something like,
mod<-gee(y1~x1,id=1:5, weights=w) #were each observation has its own id.
The problem with this solution is that gee does NOT allow weights (probably
something that would be fairly simple to modify); thus, the gee option
doesn't really work.
For those who are interested, one work around is to use the Thomas Lumley's
infjack.glm function in the glmerrs.R file at
http://faculty.washington.edu/tlumley/glmerrs.R
and the code is all in the WEAVE package at
http://faculty.washington.edu/tlumley/weave.html
In the example, notice that the SE estimates are much closer to the SE
values in the original model without weights.
> model<-glm(y1~x1,weights=w)
> varmat<-infjack.glm(model, groups=1:5)
> se<-sqrt(diag(varmat))
> se
(Intercept) x1
0.7827224 0.4969040
MAJ Paul Bliese, Ph.D.
Walter Reed Army Institute of Research
Phone: (301) 319-9873
Fax: (301) 319-9484
paul.bliese at na.amedd.army.mil
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
More information about the R-help
mailing list