[R] Zero-inflated Negat. Binom. model
Achim Zeileis
Achim.Zeileis at uibk.ac.at
Thu Feb 11 17:04:45 CET 2010
On Thu, 11 Feb 2010, Luciano La Sala wrote:
> Dear R crew:
>
> I am sorry this question has been posted before, but I can't seem to solve
> this problem yet.
Just for the others reader who might not recall that you asked virtually
the same question last week. This was my reply:
https://stat.ethz.ch/pipermail/r-help/2010-February/227040.html
> I have a simple dataset consisting of two variables: cestode intensity and
> chick size (defined as CAPI).
> Intensity is a count and clearly overdispersed, with way too many zeroes.
> I'm interested in looking at the association between these two variables,
> i.e. how well does chick size predict tape intensity?
>
> Since I have a small sample size, I fit a zero inflated negat. Binomial (not
> Poisson) model using the "pscl" package.
>
> I built tried two models and got the outputs below.
>
>> model <- zeroinfl(Int_Cesto ~ CAPI, dist = "negbin", EM = TRUE)
>
> Call:
> zeroinfl(formula = Int_Cesto ~ CAPI, dist = "negbin", EM = TRUE)
>
> Count model coefficients (negbin with log link):
> (Intercept) CAPI
> -2.99182 0.06817
> Theta = 0.4528
>
> Zero-inflation model coefficients (binomial with logit link):
> (Intercept) CAPI
> 12.1364 -0.1572
>
>> summary(model)
>
> Call:
> zeroinfl(formula = Int_Cesto ~ CAPI, dist = "negbin", EM = TRUE)
>
> Pearson residuals:
> Min 1Q Median 3Q Max
> -0.62751 -0.38842 -0.21303 -0.06899 7.29566
>
> Count model coefficients (negbin with log link):
> Estimate Std. Error z value Pr(>|z|)
> (Intercept) -2.99182 3.39555 -0.881 0.3783
> CAPI 0.06817 0.04098 1.664 0.0962 .
> Log(theta) -0.79222 0.45031 -1.759 0.0785 .
>
> Zero-inflation model coefficients (binomial with logit link):
> Estimate Std. Error z value Pr(>|z|)
> (Intercept) 12.13636 3.71918 3.263 0.00110 **
> CAPI -0.15720 0.04989 -3.151 0.00163 **
> ---
> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
>
> Theta = 0.4528
> Number of iterations in BFGS optimization: 1
> Log-likelihood: -140.2 on 5 Df
>
> QUESTIONS
>
> 1. Is my model adequately specified?
See my reply linked above. I guess it's hard to say more than that.
> 2. CAPI is included in blocks 1 of output containing negative binomial
> regression coefficients for CAPI, and is also included also in block 2
> corresponding to the inflation model. Does this make sense?
Dito.
> If I specify my model slightly differently, I get what I believe is more
> reasonable results:
>
>> model12 <- zeroinfl(Int_Cesto ~ 1|CAPI, dist = "negbin", EM = TRUE)
>> model12
>
> Call:
> zeroinfl(formula = Int_Cesto ~ 1 | CAPI, dist = "negbin", EM = TRUE)
>
> Count model coefficients (negbin with log link):
> (Intercept)
> 2.692
> Theta = 0.4346
>
> Zero-inflation model coefficients (binomial with logit link):
> (Intercept) CAPI
> 13.2476 -0.1708
>
>> summary(model12)
>
> Call:
> zeroinfl(formula = Int_Cesto ~ 1 | CAPI, dist = "negbin", EM = TRUE)
>
> Pearson residuals:
> Min 1Q Median 3Q Max
> -0.61616 -0.36902 -0.19466 -0.06666 4.85481
>
> Count model coefficients (negbin with log link):
> Estimate Std. Error z value Pr(>|z|)
> (Intercept) 2.6924 0.3031 8.883 <2e-16 ***
> Log(theta) -0.8334 0.4082 -2.042 0.0412 *
>
> Zero-inflation model coefficients (binomial with logit link):
> Estimate Std. Error z value Pr(>|z|)
> (Intercept) 13.24757 3.64531 3.634 0.000279 ***
> CAPI -0.17078 0.04921 -3.471 0.000519 ***
> ---
> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
>
> Theta = 0.4346
> Number of iterations in BFGS optimization: 1
> Log-likelihood: -141.9 on 4 Df
>
> QUESTION:
>
> 1. Is this model specification and output more reasonable?
> 2. CAPI appears only in the second block that corresponds to the
> inflation model.
You can apply standard model selection techniques. Hands-on examples for
that are in the vignette that I pointed you to last week:
vignette("countreg", package = "pscl")
Different model selection strategies may however yield different models.
AIC will prefer the model with CAPI in the count equation. In contrast,
a Wald test at 5% level would drop it. You could also look at the BIC and
at the LR test. My guess is though that the answer will be: CAPI has some
weak but practically not very relevant influence on the mean in the count
component.
But I strongly recommend that you try to get more familiar with the
zero-inflated model and the general model selection strategies. Or you
could try to get help from a local statistician to obtain an appropriate
model and interpret its results.
hth,
Z
>
> Thanks in advance!
> Luciano
>
>
>
>
>
>
>
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list