[R] Std errors in glm models w/ and w/o intercept
David Winsemius
dwinsemius at comcast.net
Mon Mar 17 02:11:43 CET 2008
I am doing a reanalysis of results that have previously been published.
My hope was to demonstrate the value of adoption of more modern
regression methods in preference to the traditional approach of
univariate stratification. I have encountered a puzzle regarding
differences between I thought would be two equivalent analyses. Using a
single factor, I compare poisson models with and without the intercept
term. As expected, the estimated coefficient and std error of the
estimate are the same for the intercept and the base level of the
factor in the two models. The sum of the intercept with each
coefficient is equal to the individual factor coefficients in the no-
intercept model. The overall model fit statistics are the same.
However, the std errors for the other factors are much smaller in the
model without the intercept.
The offset = log(expected) is based on person-years of follow-up
multiplied by the annual mortality experience of persons with known
age, gender, and smoking status in a much larger cohort. My logic in
removing the intercept was that the offset should be considered the
baseline, and that I should estimate each level compared with that
baseline. "18.5-24.9" was used as the reference level in the model with
intercept. Removing the intercept appears to be a "successful"
strategy. but have I committed any statistical sin?
> with(bmi, table(BMI,Actual_Deaths))
Actual_Deaths
BMI 0 1 2 3 4 5 6 7 11 13
18.5-24.9 311 21 1 0 0 0 0 0 0 0
15.0-18.4 353 33 8 2 0 1 0 0 0 0
25.0-29.9 367 19 0 0 0 0 0 0 0 0
30.0-34.9 349 95 39 17 8 9 3 4 0 1
35.0-39.9 351 90 50 21 20 3 3 2 1 0
40.0-55.0 386 60 15 7 4 0 0 1 0 0
> bmi.base <- with(bmi,glm(Actual_Deaths ~
BMI + offset(log( MMI_VBT_Expected)), family="poisson"))
> summary(bmi.base)
Call:
glm(formula = Actual_Deaths ~ BMI + offset(log(MMI_VBT_Expected)),
family = "poisson")
Deviance Residuals:
Min 1Q Median 3Q Max
-2.6385 -0.5245 -0.2722 -0.1041 3.4426
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.42920 0.20851 2.058 0.0395 *
BMI15.0-18.4 0.31608 0.24524 1.289 0.1974
BMI25.0-29.9 -0.22795 0.30999 -0.735 0.4621
BMI30.0-34.9 -0.09669 0.21506 -0.450 0.6530
BMI35.0-39.9 -0.04290 0.21455 -0.200 0.8415
BMI40.0-55.0 0.19348 0.22569 0.857 0.3913
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 1485.0 on 2654 degrees of freedom
Residual deviance: 1470.0 on 2649 degrees of freedom
AIC: 2760.9
Number of Fisher Scoring iterations: 6
-----------------------------------------------------
> bmi.no.int <- with(bmi,glm(Actual_Deaths ~
BMI + offset(log(MMI_VBT_Expected)) -1 ,
family="poisson"))
> summary(bmi.no.int)
Call:
glm(formula = Actual_Deaths ~ BMI + offset(log(MMI_VBT_Expected)) -
1, family = "poisson")
Deviance Residuals:
Min 1Q Median 3Q Max
-2.6385 -0.5245 -0.2722 -0.1041 3.4426
Coefficients:
Estimate Std. Error z value Pr(>|z|)
BMI18.5-24.9 0.42920 0.20851 2.058 0.0395 *
BMI15.0-18.4 0.74529 0.12910 5.773 7.79e-09 ***
BMI25.0-29.9 0.20125 0.22939 0.877 0.3803
BMI30.0-34.9 0.33251 0.05270 6.309 2.81e-10 ***
BMI35.0-39.9 0.38631 0.05057 7.639 2.19e-14 ***
BMI40.0-55.0 0.62268 0.08639 7.208 5.67e-13 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 1630.7 on 2655 degrees of freedom
Residual deviance: 1470.0 on 2649 degrees of freedom
AIC: 2760.9
It does look statistically sensible that an estimate for BMI="40.0-
55.0" with over 100 events should have a much narrower CI than
BMI="18.5-24.9" which only has 23 events. Is the model with an
intercept term somehow "spreading around uncertainty" that really
"belongs" to the reference category with its relatively low number of
events?
--
David Winsemius
More information about the R-help
mailing list