[R] Three-component Negative Binomial Mixture: R code
Achim Zeileis
Achim.Zeileis at uibk.ac.at
Thu Nov 10 21:22:23 CET 2016
On Thu, 10 Nov 2016, danilo.carita at uniparthenope.it wrote:
> Thank you for your hints, now the goodness of fit test provides me good
> results, but surprisingly for me the three-component model turns out to be
> worse than the two-component one (indeed, I focused on the three-component
> mixture because the two-component one exhibits a low p-value).
>
> In addition, I have noticed that for some data the function fails to find
> good starting values, as you have mentioned in your previuous answer. The
> problem is that the driver FLXMRnegbin() allows to specify only the theta
> parameter (and only one value, even in the event of mixtures of two or more
> components).
>
> I have read the description of flexmix() function too, but it seems that it
> does not allow to set starting values for the parameters of the model. Am I
> right? Or is there a way to do it?
No, I don't think so. You could look at the FLXMRnegbin() driver code and
tweak this directly to use better starting values etc. But I think that
the main issue is that the "theta" parameter often diverges to infinity
(i.e., a Poisson distribution) if theta is too large in (at least) one of
the components.
Given that the negbin distribution is a continuous mixture of Poisson
distributions, I'm not sure whether approaching such data with a finite
mixture of such continuous mixtures.
What to do with this situation, certainly depends on the data you have and
the questions you have about it. And I concur with Bert's advice of
contacting a local statistics expert for discussion of such issues.
hth,
Z
>
>
>
> Achim Zeileis <Achim.Zeileis at uibk.ac.at> ha scritto:
>
>> On Tue, 8 Nov 2016, danilo.carita at uniparthenope.it wrote:
>>
>>> I tried the function flexmix() with the driver FLXMRnegbin() with two
>>> components first, in order to compare its results with those provided by
>>> my function mixnbinom(). In particular, I ran the following code:
>>>
>>>
>>>> fm0 <- flexmix(y ~ 1, data = data.frame(y), k = 2, model = FLXMRnegbin())
>>>
>>>
>>> where "y" is my vector of counts. The previous function provided me the
>>> following parameters:
>>>
>>>
>>>> Comp.1 Comp.2
>>>> coef.(Intercept) 1.2746536 1.788578
>>>> theta 0.1418201 5.028766
>>>
>>>
>>> with priors 0.342874 and 0.657126, respectively. I assume that the
>>> coefficients "Intercept" represent the two means of the model (mu1 and
>>> mu2),
>>
>> No, a log link is employed, i.e., exp(1.2746536) and exp(1.788578) are the
>> means.
>>
>>> while the "theta" coefficients are the size parameters (size1 and size2).
>>
>> Yes.
>>
>>> Unfortunately, unlike my function mixnbinom(), the model computed with
>>> flexmix() did not provide a good fit to my data (p-value ~0).
>>>
>>> Is there something wrong in the process above?
>>
>> Hard to say without a reproducible example. Using parameter values similar
>> to the ones you cite above, the following seems to do a reasonable job:
>>
>> ## packages
>> library("countreg")
>> library("flexmix")
>>
>> ## artificial data from two NB distributions:
>> ## 1/3 is NB(mu = 3.5, theta = 0.2) and
>> ## 2/3 is NB(mu = 6.0, theta = 5.0)
>> set.seed(1)
>> y <- c(rnbinom(200, mu = 3.5, size = 0.2), rnbinom(400, mu = 6, size = 5))
>>
>> ## fit 2-component mixture model
>> set.seed(1)
>> fm <- flexmix(y ~ 1, k = 2, model = FLXMRnegbin())
>>
>> ## inspect estimated parameters -> look acceptable
>> parameters(fm)
>> exp(parameters(fm)[1,])
>>
>> My experience was that finding good starting values may be a problem for
>> flexmix(). So maybe setting these in some better way would be beneficial.
>
>
>
> -------------------------------------------------------------
> Danilo Carità
>
> PhD Candidate
> University of Naples "Parthenope"
> Dipartimento di Studi Aziendali e Quantitativi
> via G. Parisi, 13, 80132 Napoli - Italy
> -------------------------------------------------------------
>
More information about the R-help
mailing list