[R] Discrepancy of Neg. Binomial Estimation in R
Xiaobo Lu
xiaobolu at gmail.com
Tue Nov 6 03:48:04 CET 2007
Dear all,
I have a puzzle regarding the estimation of Neg. Binomial event count
model in R. I would greatly appreciate if anyone could shed some light
on my puzzle.
Using the glm.nb command, or the zelig command developed by Gary King
et. al., I obtain the same point estimates in R as well as in Stata.
However, if I write my own likelihood function to estimate a neg.
binomial event count model, my coefficient estimates are a little bit
different (more than 5%), and my log-likelihood score is actually
better. When I run Monte Carlo simulation, I found the coefficient
estimates of my likelihood function have smaller variance and are closer
to the true values than the glm.nb or Stata results do.
I don't quite understand why my neg. binomial likelihood function
produce different result from glm.nb or Stata, because I have written
log-likelihood function for OLS, Logit, Probit, Poisson event count
model, and my point estimates are all very close to the results produced
by glm.nb or Stata, and I think the tiny difference might be due to
different optimization algorithm used. In the negative binomial case,
the difference is too big to be credit to optimization algorithm.
Here is my code for data generating process as well as estimation.
******************************************************************************
# Data Generation
obs=500
beta = c(0,1)
# Generate the independent variable
z= runif(obs)
# Generate X matrix
x = cbind(rep(1,obs),z)
# Generate the dependent variable from a neg. binomial distribution
phi = exp(x%*%beta)
sigma_2 = 10
y = rnbinom(obs,size=phi/(sigma_2-1),mu=phi)
# Neg. Binomial log-likelihood function
like.nb=function(par,x,y)
{
phi_e=exp(x%*%par[1:2])
sig2_e=exp(par[3])+1
sum(lgamma(phi_e/(sig2_e-1)+y)-lgamma(phi_e/(sig2_e-1))+y*log((sig2_e-1)/sig2_e)-(phi_e/(
sig2_e-1))*log(sig2_e))
}
# Ng. Binomial regression
b = solve(t(x)%*%x,(t(x)%*%y))
b = matrix(c(b,var(y)/mean(y)),nrow=3) # set initial value for means and
variance
nb.res = optim(b,like.nb, y=y, x=x, method="BFGS",
control=list(fnscale=-1), hessian=T)
nbcoef=nb.res$par
nbvcovm=solve(-nb.res$hessian)
# Alternative glm function
library(MASS)
summary(glm.nb(y~z))
******************************************************************************
Thanks a lot in advance for your help.
Best,
Xiaobo
More information about the R-help
mailing list