[R] Goodness of fit of binary logistic model

Fri Aug 5 18:53:06 CEST 2011

On Fri, Aug 5, 2011 at 5:35 PM, David Winsemius <dwinsemius at comcast.net> wrote:
>>>> I have just estimated this model:
>>>> -----------------------------------------------------------
>>>> Logistic Regression Model
>>>>
>>>> lrm(formula = Y ~ X16, x = T, y = T)
>>>>
>>>>                   Model Likelihood     Discrimination    Rank Discrim.
>>>>                      Ratio Test            Indexes          Indexes
>>>>
>>>> Obs            82    LR chi2      5.58    R2       0.088    C
>>>> 0.607
>>>> 0             46    d.f.            1    g        0.488    Dxy     0.215
>>>> 1             36    Pr(> chi2) 0.0182    gr       1.629    gamma   0.589
>>>> max |deriv| 9e-11                         gp       0.107    tau-a
>>>> 0.107
>>>>                                        Brier    0.231
>>>>
>>>>        Coef    S.E.   Wald Z Pr(>|Z|)
>>>> Intercept -1.3218 0.5627 -2.35  0.0188
>>>> X16=1      1.3535 0.6166  2.20  0.0282
>>>> -----------------------------------------------------------
>>>>
>>>> Analyzing the goodness of fit:
>>>>
>>>> -----------------------------------------------------------
>>>>>
>>>>> resid(model.lrm,'gof')
>>>>
>>>> Sum of squared errors     Expected value|H0                    SD
>>>>       1.890393e+01          1.890393e+01          6.073415e-16
>>>>                  Z                     P
>>>>      -8.638125e+04          0.000000e+00
>>>> -----------------------------------------------------------
>>>>
>>>>> From the above calculated p-value (0.000000e+00), one should discard
>>>>
>>>> this model. However, there is something that is puzzling me: If the
>>>> 'Expected value|H0' is so coincidental with the 'Sum of squared
>>>> errors', why should one discard the model? I am certainly missing
>>>> something.
>>>
>>> It's hard to tell what you are missing, since you have not described your
>>> reasoning at all. So I guess what is at error is your expectation that we
>>> would have drawn all of the unstated inferences that you draw when
>>> offered
>>> the output from lrm. (I certainly did not draw the inference that "one
>>> should discard the model".)
>>>
>>> resid is a function designed for use with glm and lm models. Why aren't
>>> you
>>>  using residuals.lrm?
>>
>> ----------------------------------------------------------
>>>
>>> residuals.lrm(model.lrm,'gof')
>>
>> Sum of squared errors     Expected value|H0                    SD
>>        1.890393e+01          1.890393e+01          6.073415e-16
>>                   Z                     P
>>       -8.638125e+04          0.000000e+00
>
> Great. Now please answer the more fundamental question. Why do you think
> this mean "discard the model"?

Before answering that, let me tell you

resid(model.lrm,'gof')

calls residuals.lrm() -- so both approaches produce the same results.
(See the examples given by ?residuals.lrm)

To answer your question, I invoke the reasoning given by Frank Harrell at:

http://r.789695.n4.nabble.com/Hosmer-Lemeshow-goodness-of-fit-td3508127.html

He writes:

«The test in the rms package's residuals.lrm function is the le Cessie
- van Houwelingen - Copas - Hosmer unweighted sum of squares test for
global goodness of fit.  Like all statistical tests, a large P-value
has no information other than there was not sufficient evidence to
reject the null hypothesis.  Here the null hypothesis is that the true
probabilities are those specified by the model. »

>From Harrell's argument does not follow that if the p-value is zero
one should reject the null hypothesis? Please, correct if it is not
correct what I say, and please direct me towards a way of establishing
the goodness of fit of my model.

Paul