[R] Discrepancy of Neg. Binomial Estimation in R

Tue Nov 6 03:48:04 CET 2007

Dear all,

I have a puzzle regarding the estimation of Neg. Binomial event count 
model in R. I would greatly appreciate if anyone could shed some light 
on my puzzle.

Using the glm.nb command, or the zelig command developed by Gary King 
et. al., I obtain the same point estimates in R as well as in Stata. 
However, if I write my own likelihood function to estimate a neg. 
binomial event count model, my coefficient estimates are a little bit 
different (more than 5%), and my log-likelihood score is actually 
better. When I run Monte Carlo simulation, I found the coefficient 
estimates of my likelihood function have smaller variance and are closer 
to the true values than the glm.nb or Stata results do.

I don't quite understand why my neg. binomial likelihood function 
produce different result from glm.nb or Stata, because I have written 
log-likelihood function for OLS, Logit, Probit, Poisson event count 
model, and my point estimates are all very close to the results produced 
by glm.nb or Stata, and I think the tiny difference might be due to 
different optimization algorithm used. In the negative binomial case, 
the difference is too big to be credit to optimization algorithm.

Here is my code for data generating process as well as estimation.

******************************************************************************
# Data Generation
obs=500
beta = c(0,1)

# Generate the independent variable
z= runif(obs)

# Generate X matrix
x = cbind(rep(1,obs),z)

# Generate the dependent variable from a neg. binomial distribution
phi = exp(x%*%beta)
sigma_2 = 10
y = rnbinom(obs,size=phi/(sigma_2-1),mu=phi)

# Neg. Binomial log-likelihood function
like.nb=function(par,x,y)
{
  phi_e=exp(x%*%par[1:2])
  sig2_e=exp(par[3])+1
sum(lgamma(phi_e/(sig2_e-1)+y)-lgamma(phi_e/(sig2_e-1))+y*log((sig2_e-1)/sig2_e)-(phi_e/( 
sig2_e-1))*log(sig2_e))
}

# Ng. Binomial regression
b = solve(t(x)%*%x,(t(x)%*%y))	
b = matrix(c(b,var(y)/mean(y)),nrow=3) # set initial value for means and 
variance
nb.res = optim(b,like.nb, y=y, x=x, method="BFGS", 
control=list(fnscale=-1), hessian=T)
nbcoef=nb.res$par
nbvcovm=solve(-nb.res$hessian)

# Alternative glm function
library(MASS)
summary(glm.nb(y~z))

******************************************************************************

Thanks a lot in advance for your help.

Best,
Xiaobo