[R] OT: (quasi-?) separation in a logistic GLM
lincoln
miseno77 at hotmail.com
Wed Sep 28 10:14:14 CEST 2011
I know that this is a quite old post but I am dealing with a similar warning
message and, also after reading all the posts here, I am still in doubt with
what I should do with my analysis.
I have a dataset where the binary response variable is sex, and the
predictors are several variables (they are percentage). I know that one of
this variables, "hcp", predicts quite well the sex of the individuals given
that the ca. 85% of females have hcp>0 while ca. 85% of females have hcp=0.
I don't believe that it does matter, but all the values are very low (the
maximum value of a predictor doesn't exceed 0.5) and most of predictors show
a very left-skewed distribution (log-normal).
In any case, I believe that it could be said that quasi-separation is
occurring.
When I run the saturated model I get the warning message:
*********************
>glm.sat1<-glm(sex~hwp+hcp+hnp+twp+d1lp*d2dp+hwp:hcp+hwp:hnp+hwp:twp+hcp:hnp+hcp:twp+hnp:twp,data=dati,family="binomial")
Mensajes de aviso perdidos
glm.fit: fitted probabilities numerically 0 or 1 occurred
> summary(glm.sat1)
Call:
glm(formula = sex ~ hwp + hcp + hnp + twp + d1lp * d2dp + hwp:hcp +
hwp:hnp + hwp:twp + hcp:hnp + hcp:twp + hnp:twp, family = "binomial",
data = dati)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.6838 -0.1439 0.2699 0.5694 3.8421
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 8.940 4.046 2.210 0.027125 *
hwp -4.897 4.550 -1.076 0.281793
hcp -414.796 113.780 -3.646 0.000267 ***
hnp -3.689 6.692 -0.551 0.581487
twp 5.450 2.566 2.124 0.033657 *
d1lp -22.436 13.837 -1.621 0.104926
d2dp -21.076 9.266 -2.274 0.022940 *
d1lp:d2dp 65.102 35.394 1.839 0.065864 .
hwp:hcp 313.514 823.782 0.381 0.703516
hwp:hnp -46.572 79.467 -0.586 0.557834
hwp:twp 8.252 21.503 0.384 0.701156
hcp:hnp -1721.320 1925.766 -0.894 0.371409
hcp:twp -94.153 459.710 -0.205 0.837721
hnp:twp 16.598 38.548 0.431 0.666769
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 581.51 on 425 degrees of freedom
Residual deviance: 281.98 on 412 degrees of freedom
(41 observations deleted due to missingness)
AIC: 309.98
Number of Fisher Scoring iterations: 8
**********************
The estimates of hcp are very high and its Std.Error are also quite high. If
I go further with the analyses and use the "step" procedure I finally get a
candidate minimal adequate model:
> step.1<-step(glm.sat1)
Start: AIC=309.98
sex ~ hwp + hcp + hnp + twp + d1lp * d2dp + hwp:hcp + hwp:hnp +
hwp:twp + hcp:hnp + hcp:twp + hnp:twp
Df Deviance AIC
- hcp:twp 1 282.02 308.02
- hwp:hcp 1 282.11 308.11
- hwp:twp 1 282.13 308.13
- hnp:twp 1 282.17 308.17
- hwp:hnp 1 282.32 308.32
- hcp:hnp 1 282.90 308.90
<none> 281.98 309.98
- d1lp:d2dp 1 285.43 311.43
Step: AIC=308.02
sex ~ hwp + hcp + hnp + twp + d1lp + d2dp + d1lp:d2dp + hwp:hcp +
hwp:hnp + hwp:twp + hcp:hnp + hnp:twp
Df Deviance AIC
- hwp:hcp 1 282.13 306.13
- hwp:twp 1 282.14 306.14
- hnp:twp 1 282.18 306.18
- hwp:hnp 1 282.43 306.43
- hcp:hnp 1 283.07 307.07
<none> 282.02 308.02
- d1lp:d2dp 1 285.43 309.43
Step: AIC=306.13
sex ~ hwp + hcp + hnp + twp + d1lp + d2dp + d1lp:d2dp + hwp:hnp +
hwp:twp + hcp:hnp + hnp:twp
Df Deviance AIC
- hwp:twp 1 282.24 304.24
- hnp:twp 1 282.27 304.27
- hwp:hnp 1 282.53 304.53
- hcp:hnp 1 283.25 305.25
<none> 282.13 306.13
- d1lp:d2dp 1 285.46 307.46
Step: AIC=304.24
sex ~ hwp + hcp + hnp + twp + d1lp + d2dp + d1lp:d2dp + hwp:hnp +
hcp:hnp + hnp:twp
Df Deviance AIC
- hnp:twp 1 282.35 302.35
- hwp:hnp 1 282.83 302.83
- hcp:hnp 1 283.36 303.36
<none> 282.24 304.24
- d1lp:d2dp 1 285.56 305.56
Step: AIC=302.35
sex ~ hwp + hcp + hnp + twp + d1lp + d2dp + d1lp:d2dp + hwp:hnp +
hcp:hnp
Df Deviance AIC
- hwp:hnp 1 283.06 301.06
- hcp:hnp 1 283.37 301.37
<none> 282.35 302.35
- d1lp:d2dp 1 285.71 303.71
- twp 1 306.17 324.17
Step: AIC=301.06
sex ~ hwp + hcp + hnp + twp + d1lp + d2dp + d1lp:d2dp + hcp:hnp
Df Deviance AIC
- hcp:hnp 1 284.36 300.36
<none> 283.06 301.06
- d1lp:d2dp 1 286.26 302.26
- hwp 1 286.96 302.96
- twp 1 307.98 323.98
Step: AIC=300.36
sex ~ hwp + hcp + hnp + twp + d1lp + d2dp + d1lp:d2dp
Df Deviance AIC
<none> 284.36 300.36
- hnp 1 287.07 301.07
- d1lp:d2dp 1 287.80 301.80
- hwp 1 288.08 302.08
- twp 1 308.39 322.39
- hcp 1 484.12 498.12
Hubo 40 avisos (use warnings() para verlos)
> summary(step.1)
Call:
glm(formula = sex ~ hwp + hcp + hnp + twp + d1lp + d2dp + d1lp:d2dp,
family = "binomial", data = dati)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.7062 -0.1652 0.2669 0.5909 3.5082
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 8.442 3.925 2.151 0.0315 *
hwp -4.692 2.413 -1.945 0.0518 .
hcp -464.909 59.109 -7.865 3.68e-15 ***
hnp -5.673 3.317 -1.710 0.0872 .
twp 6.193 1.428 4.335 1.46e-05 ***
d1lp -21.332 13.487 -1.582 0.1137
d2dp -19.983 9.042 -2.210 0.0271 *
d1lp:d2dp 63.218 34.530 1.831 0.0671 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 581.51 on 425 degrees of freedom
Residual deviance: 284.36 on 418 degrees of freedom
(41 observations deleted due to missingness)
AIC: 300.36
Number of Fisher Scoring iterations: 8
> library(profileModel)
> pp<-profileModel(step.1,quantile=qchisq(0.95,1),objective="ordinaryDeviance")
Preliminary iteration ........ Done
Profiling for parameter (Intercept) ... Done
Profiling for parameter hwp ... Done
Profiling for parameter hcp ... Done
Profiling for parameter hnp ... Done
Profiling for parameter twp ... Done
Profiling for parameter d1lp ... Done
Profiling for parameter d2dp ... Done
Profiling for parameter d1lp:d2dp ... Done
******************************
Plotting the profileModel I get this:
> dev.new()
> plot(pp)
http://r.789695.n4.nabble.com/file/n3850331/Rplot.jpg
The hcp plot is far by being quadratic as I understood it should be.
I have tried to use the brglm procedure (on saturated model and glm derived
minimal adequate model) and I got almost the very same results.
Any suggestion/commentary on this?
I would appreciate any help,
Simone
--
View this message in context: http://r.789695.n4.nabble.com/OT-quasi-separation-in-a-logistic-GLM-tp875726p3850331.html
Sent from the R help mailing list archive at Nabble.com.
More information about the R-help
mailing list