[R] Weights in binomial glm
Jan van der Laan
djvanderlaan at gmail.com
Fri Apr 16 14:11:00 CEST 2010
I have some questions about the use of weights in binomial glm as I am
not getting the results I would expect. In my case the weights I have
can be seen as 'replicate weights'; one respondent i in my dataset
corresponds to w[i] persons in the population. From the documentation
of the glm method, I understand that the weights can indeed be used
for this: "For a binomial GLM prior weights are used to give the
number of trials when the response is the proportion of successes."
>From "Modern applied statistics with S-Plus 3rd ed." I understand the
same.
However, I am getting some strange results. I generated an example:
Generate some data which is simular to my dataset
> Z <- rbinom(1000, 1, 0.1)
> W <- round(rnorm(1000, 100, 40))
> W[W < 1] <- 1
Probability of success can either be estimated using:
> sum(Z*W)/sum(W)
[1] 0.09642109
Or using glm:
> model <- glm(Z ~ 1, weights=W, family=binomial())
Warning message:
In glm.fit(x = X, y = Y, weights = weights, start = start, etastart =
etastart, :
fitted probabilities numerically 0 or 1 occurred
> predict(model, type="response")[1]
1
2.220446e-16
These two results are obviously not the same. The strange thing is
that when I scale the weights, such that the total equals one, the
probability is correctly estimated:
> model <- glm(Z ~ 1, weights=W/sum(W), family=binomial())
Warning message:
In eval(expr, envir, enclos) : non-integer #successes in a binomial glm!
> predict(model, type="response")[1]
1
0.09642109
However scaling of the weights should, as far as I am aware, not have
an effect on the estimated parameters. I also tried some other
scalings. And, for example scaling the weights by 20 also gives me the
correct result.
> model <- glm(Z ~ 1, weights=W/20, family=binomial())
Warning message:
In eval(expr, envir, enclos) : non-integer #successes in a binomial glm!
> predict(model, type="response")[1]
1
0.09642109
Am I misinterpreting the weights? Could this be a numerical problem?
Regards,
Jan
More information about the R-help
mailing list