[R] Std errors in glm models w/ and w/o intercept
David Winsemius
dwinsemius at comcast.net
Mon Mar 17 16:09:03 CET 2008
Prof Brian Ripley <ripley at stats.ox.ac.uk> wrote in
news:alpine.LFD.1.00.0803170624220.5706 at gannet.stats.ox.ac.uk:
> On Mon, 17 Mar 2008, David Winsemius wrote:
>>
>> I am doing a reanalysis of results that have previously been
>> published. My hope was to demonstrate the value of adoption of more
>> modern regression methods in preference to the traditional approach
>> of univariate stratification. I have encountered a puzzle regarding
>> differences between I thought would be two equivalent analyses.
>> Using a single factor, I compare poisson models with and without
>> the intercept term. As expected, the estimated coefficient and std
>> error of the estimate are the same for the intercept and the base
>> level of the factor in the two models. The sum of the intercept
>> with each coefficient is equal to the individual factor
>> coefficients in the no- intercept model. The overall model fit
>> statistics are the same. However, the std errors for the other
>> factors are much smaller in the model without the intercept.
>>
>> The offset = log(expected) is based on person-years of follow-up
>> multiplied by the annual mortality experience of persons with known
>> age, gender, and smoking status in a much larger cohort. My logic
>> in removing the intercept was that the offset should be considered
>> the baseline, and that I should estimate each level compared with
>> that baseline. "18.5-24.9" was used as the reference level in the
>> model with intercept. Removing the intercept appears to be a
>> "successful" strategy. but have I committed any statistical sin?
>
> No, but you have apparently not understood what the 'intercept'
> means here. With a single factor and the default contr.treatment,
> it is the coefficient used to predict the first category of the
> factor, and the remaining coefficients are log ratios of mean rate
> for the named category to the first. When you drop the intercept,
> the coefficients are no longer contrasts.
Thank you for your interest in my question, Prof Ripley. I did
understand that the intercept coeff was the log(ratio) of the base
group to the offset and that exp(coeff$intercept)) can be interpreted
as a mortality ratio. Also, that the coefficients in the first model
were log ratios of effect(BMI) to coefficient(BMI-reference), so that
exp(coeff$level+coeff$intercept) would be a level's ratio relative to
the "expected". My concern was with the markedly lower std errors
around the "other" level coefficients when the intercept was removed.
My preference would be to use the non-intercept model.
> When you drop the intercept, the coding (and hence the
> interpretation of the coefficients) of the first factor in the model
> changes. See MASS chapter 6. So you are comparing apples with
> oranges.
MASS.2ed.ch6, "Linear Statistical Models", says that the lm() models
with and without intercepts have different contrast matrices and
discusses interpretation of coefficients. If I to consult a later
edition, will I find a discussion of the impact of those differences on
the std errors of the coefficients?
>>
>>> with(bmi, table(BMI,Actual_Deaths))
>> Actual_Deaths
>> BMI 0 1 2 3 4 5 6 7 11 13 SE.no-int SE.int
>> 18.5-24.9 311 21 1 0 0 0 0 0 0 0 0.20851 0.20851
>> 15.0-18.4 353 33 8 2 0 1 0 0 0 0 0.12910 0.24524
>> 25.0-29.9 367 19 0 0 0 0 0 0 0 0 0.22939 0.30999
>> 30.0-34.9 349 95 39 17 8 9 3 4 0 1 0.05270 0.30999
>> 35.0-39.9 351 90 50 21 20 3 3 2 1 0 0.05057 0.21455
>> 40.0-55.0 386 60 15 7 4 0 0 1 0 0 0.08639 0.22569
snipped model output ...appended SE(coeff)'s to factor counts
>> It does look statistically sensible that an estimate for BMI="40.0-
>> 55.0" with over 100 events should have a much narrower CI than
>> BMI="18.5-24.9" which only has 23 events. Is the model with an
>> intercept term somehow "spreading around uncertainty" that really
>> "belongs" to the reference category with its relatively low number
>> of events?
To my eye, the SE's in the no-intercept model make much more sense as
far as their relationship to the sum of counts. I also have a related
concern that I may have in the past been using less efficient
inferential methods when analyzing models with external standards by
accepting the default intercept.
--
David Winsemius
More information about the R-help
mailing list