[R] Appropriate tests for logistic regression with a continuous predictor variable and Bernoulli response variable
Eik Vettorazzi
E.Vettorazzi at uke.uni-hamburg.de
Fri Jul 9 14:58:21 CEST 2010
Have a look at
?anova
or rather
?anova.glm
hth
Am 09.07.2010 04:46, schrieb Kiyoshi Sasaki:
> I have a data with binary response variable, repcnd (pregnant or not) and one predictor continuous variable, svl (body size) as shown below. I did Hosmer-Lemeshow test as a goodness of fit (as suggested by a kind “R-helper” previously). To test whether the predictor (svl, or body size) has significant effect on predicting whether or not a female snake is pregnant, I used the differences between null deviance and residual deviance using a code as following:
>
>
> 1-pchisq(mod.fit$null.deviance - mod.fit$deviance, mod.fit$df.null - mod.fit$df.residual)
>
> Could anyone tell me whether I did the test properly? I did this test because I thought Wald test/z score listed in the output from "summary(mod.fit)" is not appropriate for a kind of data I have. Does R have automated function to run appropriate tests? I have pasted my R output below.
>
> Thank you in advance for your time and help.
>
> Kiyoshi
>
>
> repcnd svl
> 1 1 51.5
> 2 1 52.5
> <edited>
> 294 0 59.8
> 298 1 60.0
> 300 1 51.7
> 301 1 57.4
> 302 1 60.9
> 303 0 56.8
> 304 0 50.0
> -------------------
>
>> mod.fit <- glm(formula = gb.no.M$repcnd ~ gb.no.M$svl, family = binomial(link = logit), data = gb.no.M, na.action = na.exclude, control = list(epsilon = 0.0001, maxit = 50, trace = F))
>> summary(mod.fit)
>>
>
> Call:
> glm(formula = gb.no.M$repcnd ~ gb.no.M$svl, family = binomial(link = logit),
> data = gb.no.M, na.action = na.exclude, control = list(epsilon = 1e-04,
> maxit = 50, trace = F))
>
> Deviance Residuals:
> Min 1Q Median 3Q Max
> -1.757 -1.109 0.734 1.113 1.632
>
> Coefficients:
> Estimate Std. Error z value Pr(>|z|)
> (Intercept) -7.08565 1.84106 -3.849 0.000119 ***
> gb.no.M$svl 0.13529 0.03474 3.894 9.85e-05 ***
> ---
> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
> (Dispersion parameter for binomial family taken to be 1)
>
> Null deviance: 301.92 on 217 degrees of freedom
> Residual deviance: 285.04 on 216 degrees of freedom
> (8 observations deleted due to missingness)
> AIC: 289.04
>
> Number of Fisher Scoring iterations: 3
> -------------------------------------------------------------------------------
>
>> Hosmer-Lemeshow test
>>
>> hosmerlem <- function (y, yhat, g = 10)
>>
> + {
> + cutyhat <- cut(yhat, breaks = quantile(yhat, probs = seq(0, 1, 1/g)), include.lowest = T)
> + obs <- xtabs(cbind(1 - y, y) ~ cutyhat)
> + expect <- xtabs(cbind(1 - yhat, yhat) ~ cutyhat)
> + chisq <- sum((obs - expect)^2/expect)
> + P <- 1 - pchisq(chisq, g - 2)
> + c("X^2" = chisq, Df = g - 2, "P(>Chi)" = P)
> + }
>
>> mod.fit <- glm(formula = no.NA$repcnd ~ no.NA$svl, family = binomial(link = logit), data = no.NA, na.action = na.exclude, control = list(epsilon = 0.0001, maxit = 50, trace = F))
>>
>
>
>> hosmerlem(no.NA$repcnd, fitted(mod.fit))
>>
> X^2 Df P(>Chi)
> 6.8742531 8.0000000 0.5502587
> ---------------------------------------------------------------------------------------------------
>
>> list(p.value = round(1-pchisq(mod.fit$null.deviance - mod.fit$deviance,
>>
> + mod.fit$df.null- mod.fit$df.residual),6),
> + df = mod.fit$df.null- mod.fit$df.residual,
> + change = mod.fit$null.deviance - mod.fit$deviance)
>
> $p.value
> [1] 4e-05
>
> $df
> [1] 1
>
> $change
> [1] 16.87895
>
>
>
> [[alternative HTML version deleted]]
>
>
>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Eik Vettorazzi
Institut für Medizinische Biometrie und Epidemiologie
Universitätsklinikum Hamburg-Eppendorf
Martinistr. 52
20246 Hamburg
T ++49/40/7410-58243
F ++49/40/7410-57790
More information about the R-help
mailing list