[R] Logistic regression goodness of fit tests
Frank E Harrell Jr
f.harrell at vanderbilt.edu
Thu Mar 10 23:19:41 CET 2005
Trevor Wiens wrote:
> I was unsure of what suitable goodness-of-fit tests existed in R for logistic regression. After searching the R-help archive I found that using the Design models and resid, could be used to calculate this as follows:
>
> d <- datadist(mydataframe)
> options(datadist = 'd')
> fit <- lrm(response ~ predictor1 + predictor2..., data=mydataframe, x =T, y=T)
> resid(fit, 'gof').
>
> I set up a script to first use glm to create models use stepAIC to determine the optimal model. I used this instead of fastbw because I found the AIC values to be completely different and the final models didn't always match. Then my script takes the reduced model formula and recreates it using lrm as above. Now the problem is that for some models I run into an error to which I can find no reference whatsoever on the mailing list or on the web. It is as follows:
>
> test.lrm <- lrm(cclo ~ elev + aspect + cti_var + planar + feat_div + loamy + sands + sandy + wet + slr_mean, data=datamatrix, x = T, y = T)
> singular information matrix in lrm.fit (rank= 10 ). Offending variable(s):
> slr_mean
> Error in j:(j + params[i] - 1) : NA/NaN argument
>
>
> Now if I add the singularity criterion and make the value smaller than the default of 1E-7 to 1E-9 or 1E-12 which is the default in calibrate, it works. Why is that?
>
> Not being a statistician but a biogeographer using regression as a tool, I don't really understand what is happening here.
>
> Does changing the tol variable, change how I should interpret goodness-of-fit results or other evaluations of the models created?
>
> I've included a summary of the data below (in case it might be helpful) with all variables in the data frame as it was easier than selecting out the ones used in the model.
>
> Thanks in advance.
>
> T
The goodness of fit test only works on prespecified models. It is not
valid when stepwise variable selection is used (unless perhaps you use
alpha=0.5).
--
Frank E Harrell Jr Professor and Chair School of Medicine
Department of Biostatistics Vanderbilt University
More information about the R-help
mailing list