[R] Logistic regression goodness of fit tests

Thu Mar 10 23:29:20 CET 2005

On Thu, 10 Mar 2005 22:36:09 +0100 (CET)
Roger Bivand <Roger.Bivand at nhh.no> wrote:

> From one geographer to another, and being prepared to bow to
> better-founded explanations, you seem to have included a variable - the
> offending variable slr_mean - that is very highly correlated with another.  
> Making the tolerance tighter says that you are prepared to take the risk
> of confounding your results. You've already "been fishing" for right hand
> side variables anyway, so your results are somewhat prejudiced, aren't
> they?
> 
> I think you may also like to review which of the right hand side variables
> should be treated as factors rather than numeric (looking at the summary
> suggests that many are factors), and perhaps the dependent variable too,
> although lrm() seems to take care of this if you haven't.
> 

Thanks for your informative reply.

The nature of the research is habitat selection for 15 species of grassland birds (my masters project).  The response here is presence of Chestnut-collared Longsur (cclo). I very carefull reviewed the variables for collinearity and none of them showed any difficulty except in a few cases which I've used to break some cases into two models where one would have otherwise seemed reasonable. Just out of interest however I did run the global model, and this problem didn't occur, which seems to indicate to me, based on your comments, I'm seeing an interaction effect, not a result of two closely correlated variables. 

I don't think I've been fishing. I selected variables for inclusion in competing models based on ecologically reasonable criteria. I did examine relationships between species occurance and static variables such as dem derived variables, to see if the data supported including all variables that based on ecological criteria should explain the birds distribution. I have included some variables inspite of weak statistical relationships based on a paper by Anderson, Burnham and Thompson in Journal of Wildlife Management which talks about how factors that are ecologically significant, can have interaction effects in a model to provide explanation in your response variable, even though individually they are not statistically significant by themselves. So, I've tried to avoid fishing, but instead simply trying to select the most parsimonious models from the set of selected models for each species.

Based on your advice once I've selected my top candidate models, I'll re-run at lower tolerances and only keep models that can pass at that level. Alternatively I could simply reject models that don't pass at lower tolerances. I do find it curious however that they run fine using glm (family = binomial) without complaint.

Thanks again.

T
-- 
Trevor Wiens 
twiens at interbaun.com

The significant problems that we face cannot be solved at the same 
level of thinking we were at when we created them. 
(Albert Einstein)