[R] Zero-Inflated Negative Binomial Regression

Achim Zeileis Achim.Zeileis at uibk.ac.at
Tue Jun 4 18:40:22 CEST 2013


On Tue, 4 Jun 2013, Carly Bobak wrote:

> Hi!
>
> I'm running a zero-inflated negative binomial regression on a large (n=54822) set of confidential data. I'm using the code:
> ZerNegBinRegress<-zeroinfl(Paper~.|., data=OvsP, dist="negbin", EM=TRUE)
>
> And keep getting the error:
>
> Warning message:
> glm.fit: fitted probabilities numerically 0 or 1 occurred
>
> I've done enough reading about this error to realize that I have a 
> linear separation issue, for which the solution seems to be eliminating 
> variables. However, elminating via trial and error isn't getting me 
> anywhere, especially since the regression takes about 10 minutes to run 
> every time.

In many cases, separation issues are apparent from simple explorative 
analysis. If you have categorical covariates x, say, then you could look 
at xtabs(~ factor(Paper > 0) + x, data = OvsP). For numeric x, you could 
cut() it first.

Or you could look at the visualization (which does the cut() internally 
if necessary): plot(factor(Paper > 0) ~ x, data = OvsP)

Then the variables with separation issues should be easy to find (or at 
least to narrow it down).

> I tried to identify the issue using the xtab function and 
> received this error:
>
> Error: cannot allocate vector of size 1.4 Gb

Hard to say what goes wrong here. Possibly, the variables are numeric or 
you have specified the formula incorrectly.

> Is there an easier way to identify the covariate that is causing me 
> problems here? Also, am I right in assuming that the count regression 
> should be fine, while its just the logit regression that is causing me 
> the issue (I've never done a zero-inflated regression before).

In the zero-inflation formulation you can't separate the binary part from 
the count part of the model. However, for the hurdle specification, you 
can estimate the models separately.

If necessary, the truncated count part of the _hurdle_ model can be 
estimated with the zerotrunc() function from the "countreg" package on 
R-Forge. That is exactly the same code as in "pscl". But there are also 
other implementations of zero-truncated models in R, e.g., in "VGAM".

hth,
Z

> Thanks!!
>
> Carly
>
>
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list