[R-sig-ME] help with false convergence warning; sparse 1s in binary data

Wed Aug 28 00:07:09 CEST 2019

  I'm not sure whether the troubleshooting vignette in the CRAN version
is entirely up to date. Here's what it says about this warning at
https://github.com/glmmTMB/glmmTMB/blob/master/glmmTMB/vignettes/troubleshooting.rmd
:

----
It's usually hard to diagnose the source of this warning (this [Stack
Overflow
answer](https://stackoverflow.com/questions/40039114/r-nlminb-what-does-false-convergence-actually-mean)
explains a bit more about what it means). Reasonable methods for making
sure your model is OK are:

- restart the model at the estimated fitted values
- try using a different optimizer, e.g.
`control=glmmTMBControl(optimizer=optim, optArgs=list(method="BFGS"))`

and see if the results are sufficiently similar to the original fit.

---

  There are a few fishy-looking things about this fit:

 the intercept is tiny (plogis(-19) = 5 x 10^(-9), implying a
ridiculously small baseline probability)

  the std dev associated with the AR1 term (63) looks ridiculously
large, unless there's some scaling I'm forgetting/not thinking about.

  Are there a lot of all-zero groups?  In principle these *should* be
"shrinkable", but they tend to cause weird-looking conditional mode
distributions.

  What is your actual fraction of 1s?

  I'm not sure how issue #482 helps: that's a different warning
(non-positive definite Hessian), isn't it?

   Other than trying different starting values and optimizers and seeing
how the results compare, I don't have a lot of solutions.  Regularizing
priors are probably a good idea (which recommends INLA, if it's fast
enough for you) -- have been thinking about implementing the option in
glmmTMB but haven't gotten there yet.
  The gold standard for figuring out whether the results are reliable is
to simulate similar cases with *known* parameters and seeing whether
glmmTMB tends to get the right answers to the focal questions even when
some non-focal parameters are wonky ...

  Ben Bolker

On 2019-08-27 3:52 p.m., Fabiola Iannarilli wrote:
> Hi all!
> 
> I am using glmmTMB to model a set of time series of binary responses
> collected at ~30 sites.  The probability of success fluctuates diurnally,
> is likely to vary across sites, and I expect the data may also exhibit
> short-term (serial) dependence.  Thus, I am including
> sin(2*pi*time/(24*60)) and cos(2*pi*time/(24*60)) as fixed effects, a
> random intercept for each site, and a within-site random effect that
> follows an AR1 structure . The dataset is quite large (~2,200,000 records),
> so I am initially exploring models fit to only a subset of the data
> (~190,000 records).
> 
>> mod_1min <- glmmTMB(y ~ sin(2*pi*time/(24*60)) + cos(2*pi*time/(24*60)) +
> (1|id) + ar1(as.factor(time) + 0 | id), data=y_1min, family=
> binomial(link="logit"), ziformula = ~0)
> 
> 
> 
> Warning message:
> 
> In fitTMB(TMBStruc) :
> 
>   Model convergence problem; false convergence (8). See
> vignette('troubleshooting')
> 
> 
> 
>> summary(mod_1min)
> 
>  Family: binomial  ( logit )
> 
> Formula:
> 
> y ~ sin(2 * pi * time/(24 * 60)) + cos(2 * pi * time/(24 * 60)) +
> 
>     (1 | id) + ar1(as.factor(time) + 0 | id)
> 
> Data: y_1min
> 
> 
> 
>      AIC      BIC   logLik deviance df.resid
> 
>    224.0    278.5   -106.0    212.0    64803
> 
> 
> 
> Random effects:
> 
> 
> 
> Conditional model:
> 
>  Groups Name                 Variance  Std.Dev. Corr
> 
>  id     (Intercept)          7.908e-02  0.2812
> 
>  id.1   as.factor(time)16999 4.084e+03 63.9073  0.33 (ar1)
> 
> Number of obs: 64809, groups:  id, 9
> 
> 
> 
> Conditional model:
> 
>                              Estimate Std. Error z value Pr(>|z|)
> 
> (Intercept)                  -19.4483     1.9487  -9.980   <2e-16 ***
> 
> sin(2 * pi * time/(24 * 60))  -0.1585     1.3361  -0.119    0.906
> 
> cos(2 * pi * time/(24 * 60))   0.2468     1.4576   0.169    0.866
> 
> ---
> 
> Signif. codes:  0 â€˜***â€™ 0.001 â€˜**â€™ 0.01 â€˜*â€™ 0.05 â€˜.â€™ 0.1
> â€˜ â€™ 1
> 
> 
> 
> glmmTMB gives a warning about false convergence. My guess is that this is
> due to the low number of 1s in the data, which results in a flat likehood
> and very low estimate for the intercept. My questions are:
> 
> 1)      Is there a way to verify that the sparseness of 1s (and the
> intercept) is the actual problem? If so, can I trust the inference for the
> fixed effects parameters?
> 
> 2)      My research questions also focus on evaluating the presence of
> autocorrelation in the response.  I’m concerned that the variance
> parameters are not well identified. Can I trust the estimate of the
> autocorrelation parameter? Is there an alternative way to specify the model
> that might improve convergence?
> 
> 3)      Is it possible that a different optimizer or different Hessian
> approximation might help? I tried the solution described at
> https://github.com/glmmTMB/glmmTMB/issues/482, but it also gives a warning:
> 
> “45: In par[-random] <- par.fixed:   number of items to replace is not a
> multiple of replacement length”
> 
> 4)      Following the suggestion on this thread
> https://github.com/glmmTMB/glmmTMB/issues/386, I am also running the same
> model using INLA (given the dataset size, I am afraid MCMC will be too
> computationally demanding), but there the problem is what priors to use.  I
> am also running into memory allocation problems.
> 
> I would appreciate any suggestions you may have.
> 
> Best,
> 
> Fabiola
> 
> 
> 
> <http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
> Virus-free.
> www.avg.com
> <http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
> <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
> 
> 	[[alternative HTML version deleted]]
> 
> _______________________________________________
> R-sig-mixed-models using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>