[R] glm cannot find valid starting values

Fri Jul 21 14:10:14 CEST 2006

On Fri, 21 Jul 2006, Dan Bebber wrote:

> glm(S ~ -1 + Mdif, family=quasipoisson(link=identity), start=strt, sdat)
> gives error:
> 
> > Error in glm.fit(x = X, y = Y, weights = weights, start = start, etastart 
> > =
> > etastart,  :
> >        cannot find valid starting values: please specify some
> 
> strt is set to be the coefficient for a similar fit
> glm(S ~ -1 + I(Mdif + 1),...
> i.e. (Mdif + 1) is a vector similar to Mdif.
> The error appears to occur when some values of Mdif are negative,
> though I have not had this problem with simulated datasets.

Right: if Mdif contains both positive and negative values there are no
coefficients that are valid for that model (you are bound to predict 
negative means).

You often do better to take etastart from another fit than start, but that 
will not help here, I believe.

BTW, your example cannot be pasted in as 'sdat' self-references.  It could 
be fixed, but I gave up at that point.

> 
> Any solutions greatly appreciated (full details and data below).
> 
> Dan Bebber
> Department of Plant Sciences
> University of Oxford
> 
> OS: WinXP Pro SP2 and Win ME (tried both)
> Processor: Dual Intel Xeon and Pentium 4 (tried both)
> R version: 2.3.0 and 2.3.1 (tried both)
> 
> #Full details (can be pasted directly into R):
> #Data:
> sdat <- data.frame(
> S = c(0, 0, 0, 0, 28, 0, 1, 7, 0, 0, 39, 2, 0, 0, 40, 0, 0, 0, 0,
> 0, 0, 15, 0, 0, 3, 0, 0, 0, 2, 0, 3, 0, 30, 0, 20, 0, 1, 0, 0,
> 1, 21, 0, 0, 4, 14, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 0, 3, 0,
> 2, 5, 0, 0, 0, 0, 0, 0, 0, 25, 0, 5, 0, 0, 0, 1, 0, 1, 0, 0,
> 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0,
> 0, 0, 0, 0, 0, 1, 10, 0, 0, 0, 0, 0, 9, 1, 1, 1, 1, 0, 0, 3,
> 0, 27, 7, 0, 0, 0, 0, 0, 1, 1, 4, 2, 1, 2, 4, 1, 4, 6, 12, 4,
> 6, 3, 4, 0, 4, 0, 6, 1, 3, 1, 4, 4, 1, 1, 2, 1, 6, 1, 0, 0, 1,
> 0, 1, 0, 0, 6, 0, 0, 0, 0, 2, 2, 3, 1, 6, 2, 2, 1, 1, 4, 4, 3,
> 3, 7, 1, 3, 5, 6, 4, 0, 1, 4, 2, 0, 1, 1, 1, 0, 1, 1, 1, 0, 0,
> 1, 0, 4, 0, 0, 1, 0, 2, 0, 0, 1, 2, 0, 4, 0, 2, 0, 3, 0, 2, 2,
> 0, 4, 0, 2, 0, 1, 2, 3, 0, 0, 0, 0, 3, 1, 0, 0, 0, 3, 2, 1, 2,
> 1, 1, 2, 0, 0, 3, 3, 1, 0, 1, 1, 2, 0, 1, 0, 1, 0, 1, 3, 2, 0,
> 0, 2, 1, 0, 1, 0, 0, 0, 0, 0, 0, 4, 2, 4, 0, 2, 0, 0, 0, 0, 0,
> 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 3, 2, 0, 0, 0, 0, 0,
> 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 4, 1, 0, 0, 1),
> M = 620+c(0,cumsum(sdat$S[-328])))
> #S is the (unknown) number of N individuals that irreversibly change state 
> in a time
> #interval t. The data provided are a subset of the full data set.
> #M is the cumulative sum of individuals that have changed state up to t-1.
> #Assume that the rate of state change is constant (S ~ kM), but the
> #distribution of S is clustered.
> #The goal is to estimate N.
> #N can be estimated by fitting:
> qpglm <- glm(S ~ M, family = quasipoisson(link = identity), sdat)
> summary(qpglm)
> N.est <- -coef(qpglm)[1]/coef(qpglm)[2]
> N.est
> #i.e. x-intercept is minus intercept/slope
> #To estimate confidence limits on N.est, fit models without intercept to
> #N.est - M + x, where x is an integer. The model will have the lowest 
> deviance
> #when x = 0.
> x <- 0
> Mdif <- N.est - M + x
> qpglm2 <- glm(S ~ -1 + Mdif, family = quasipoisson(link = identity), sdat)
> summary(qpglm2)
> #Use analysis of deviance to estimate confidence limits on N.est:
> disp <- summary(qpglm)$dispersion
> dfres <- df.residual(qpglm)
> dev.res <- deviance(qpglm)
> #From MASS4, p. 210, assume that changes in deviance scaled by
> #dispersion as |x| increases have an F distribution
> dev.crit <- dev.res+qf(0.95,1,dfres)*disp
> dev.crit
> #values of x for which the deviance = dev.crit give approximate 95% 
> confidence limits
> #on N.est.
> #The error occurs when x <= -91.7:
> x <- -91.7
> sdat$Mdif <- N.est - sdat$M + x
> strt <- coef(glm(S ~ -1 + I(Mdif+1), family = quasipoisson(link = identity), 
> sdat))
> qpglm2 <- glm(S ~ -1 + Mdif, family = quasipoisson(link = identity), 
> start=strt, sdat)
> #The problem is that this interferes with optimization to find values of x 
> for which
> #deviance = dev.crit
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595