[R] Goodness of fit of binary logistic model
Paul Smith
phhs80 at gmail.com
Fri Aug 5 18:53:06 CEST 2011
On Fri, Aug 5, 2011 at 5:35 PM, David Winsemius <dwinsemius at comcast.net> wrote:
>>>> I have just estimated this model:
>>>> -----------------------------------------------------------
>>>> Logistic Regression Model
>>>>
>>>> lrm(formula = Y ~ X16, x = T, y = T)
>>>>
>>>> Model Likelihood Discrimination Rank Discrim.
>>>> Ratio Test Indexes Indexes
>>>>
>>>> Obs 82 LR chi2 5.58 R2 0.088 C
>>>> 0.607
>>>> 0 46 d.f. 1 g 0.488 Dxy 0.215
>>>> 1 36 Pr(> chi2) 0.0182 gr 1.629 gamma 0.589
>>>> max |deriv| 9e-11 gp 0.107 tau-a
>>>> 0.107
>>>> Brier 0.231
>>>>
>>>> Coef S.E. Wald Z Pr(>|Z|)
>>>> Intercept -1.3218 0.5627 -2.35 0.0188
>>>> X16=1 1.3535 0.6166 2.20 0.0282
>>>> -----------------------------------------------------------
>>>>
>>>> Analyzing the goodness of fit:
>>>>
>>>> -----------------------------------------------------------
>>>>>
>>>>> resid(model.lrm,'gof')
>>>>
>>>> Sum of squared errors Expected value|H0 SD
>>>> 1.890393e+01 1.890393e+01 6.073415e-16
>>>> Z P
>>>> -8.638125e+04 0.000000e+00
>>>> -----------------------------------------------------------
>>>>
>>>>> From the above calculated p-value (0.000000e+00), one should discard
>>>>
>>>> this model. However, there is something that is puzzling me: If the
>>>> 'Expected value|H0' is so coincidental with the 'Sum of squared
>>>> errors', why should one discard the model? I am certainly missing
>>>> something.
>>>
>>> It's hard to tell what you are missing, since you have not described your
>>> reasoning at all. So I guess what is at error is your expectation that we
>>> would have drawn all of the unstated inferences that you draw when
>>> offered
>>> the output from lrm. (I certainly did not draw the inference that "one
>>> should discard the model".)
>>>
>>> resid is a function designed for use with glm and lm models. Why aren't
>>> you
>>> using residuals.lrm?
>>
>> ----------------------------------------------------------
>>>
>>> residuals.lrm(model.lrm,'gof')
>>
>> Sum of squared errors Expected value|H0 SD
>> 1.890393e+01 1.890393e+01 6.073415e-16
>> Z P
>> -8.638125e+04 0.000000e+00
>
> Great. Now please answer the more fundamental question. Why do you think
> this mean "discard the model"?
Before answering that, let me tell you
resid(model.lrm,'gof')
calls residuals.lrm() -- so both approaches produce the same results.
(See the examples given by ?residuals.lrm)
To answer your question, I invoke the reasoning given by Frank Harrell at:
http://r.789695.n4.nabble.com/Hosmer-Lemeshow-goodness-of-fit-td3508127.html
He writes:
«The test in the rms package's residuals.lrm function is the le Cessie
- van Houwelingen - Copas - Hosmer unweighted sum of squares test for
global goodness of fit. Like all statistical tests, a large P-value
has no information other than there was not sufficient evidence to
reject the null hypothesis. Here the null hypothesis is that the true
probabilities are those specified by the model. »
>From Harrell's argument does not follow that if the p-value is zero
one should reject the null hypothesis? Please, correct if it is not
correct what I say, and please direct me towards a way of establishing
the goodness of fit of my model.
Paul
More information about the R-help
mailing list