[R] relative risk regression with survey data

Wed Sep 15 17:57:30 CEST 2010

On Wed, 15 Sep 2010, Ravi Varadhan wrote:

> Dear Thomas,
>
> You said, "the log-binomial model is very non-robust when the fitted values
> get close to 1, and there is some controversy over the best approach."
> Could you please point me to a paper that discusses the issues?
>
> I have written some code to do maximum likelihood estimation for relative,
> additive, and mixed risk regression models with binomial model.  I have been
> able to obtain good convergence.  I have used bootstrap to get standard
> errors.  However, I am not sure if these standard errors are valid when
> fitted values were close to 0 or 1. It seems to me that when the fitted
> probabilities are close to 0 or 1, there is not a good way to estimate
> standard errors.

There's a technical report at 
http://www.bepress.com/uwbiostat/paper293/
with simulations, some theory, and references.  It's under review at the moment, after being forgotten for a few years.

The distribution of the parameter estimates when the true parameter is on the boundary of the parameter space is a separate mess.
  Theoretically it is the intersection of the the multivariate Normal with the parameter space, and if the parameter space has a piecewise linear boundary the log likelihood ratio has a chi-squared mixture distribution.  In practice, if there isn't a hard edge to the covariate distribution it's not going to be easy to get a good approximation to the distribution of parameter estimates. As an example of the complications, the sampling distributions for fixed and random design matrices can be very different, because a random design matrix means that the estimated edge of the parameter space moves from one realization to another.

     -thomas

Thomas Lumley
Professor of Biostatistics
University of Washington, Seattle