[R-sig-ME] Poisson or Gaussian when modelling count data + heteroscedasticity in predictor variables

Mon Nov 7 17:49:42 CET 2016

Luciana,

Although it is always hard to say, 5 to 20 is not necessarily "small". Really, it is all about the diagnostics, which you say point to using a linear model. In your original email you made a distinction heteroscedasticity and "count data". This really isn't a distinction, because the main statistical thing GLMs do is to account for heteroscedasticity being driven by different sampling distributions (i.e., the variance-mean relationship). If you have enough data points to identify heteroscedasticity, then I think the heteroscedasticity should be your focus. Note, however, that estimating heteroscedasticity and incorporating this into your analysis can be problematic for small counts. You are very right to worry about heteroscedasticity in general, which can play havoc with type I error.

Linear models (with GLS to account for heteroscedasticity) can have fine performance in terms of type I errors. There can be a loss of power, but not always. You might find a recent paper useful: Warton et al. 2016  http://onlinelibrary.wiley.com/doi/10.1111/2041-210X.12552/abstract

Cheers, Tony

-----Original Message-----
From: R-sig-mixed-models <r-sig-mixed-models-bounces at r-project.org> on behalf of Luciana Motta <tasmacetus at gmail.com>
Date: Monday, November 7, 2016 at 10:28 AM
To: Tom Wilding <Tom.Wilding at sams.ac.uk>
Cc: "r-sig-mixed-models at r-project.org" <r-sig-mixed-models at r-project.org>
Subject: Re: [R-sig-ME] Poisson or Gaussian when modelling count data + heteroscedasticity in predictor variables

    Thank you Tom.

    My richness data go from 5 to 20, don't think that applies to "large". But
    the model diagnostics check do look good.

    Model does not check good with neg binom, just like with Poisson (there is
    no overdisperson anyways)

    Will check about the individual observation level term.

    Any other reading suggestions about heteroscedasticity vs normality?

    Thank you again

    On Mon, Nov 7, 2016 at 4:56 PM, Tom Wilding <Tom.Wilding at sams.ac.uk> wrote:

    > Hi Luciana - if your count data is large (not 'near' zero) then the normal
    > model might be fine and you could then account for heteroscedasticity using
    > GLS (as you seem to have done) - if the model diagnostics check-out then it
    > should be OK.  You could log or log+1 transform your response and see how
    > that looks too, just for interest (if you have zeros then this approach is
    > not likely to be successful).  Also, you could stick with the Poisson GLMM
    > and include an individual observation level term (Elston, D. A., et al.
    > (2001). "Analysis of aggregation, a worked example: numbers of ticks on red
    > grouse chicks." Parasitology 122(05): 563-569), - this should also be
    > reasonable (and is very easy to implement) and might be of interest (though
    > may have its detractors).  Your negative binomial solution should also
    > address the over-dispersion issue though I'm less sure what the residual
    > patterns should look like (you could fake-up some data to check these).
    >
    > Best
    >
    > Tom.
    >
    >
    >
    > -----Original Message-----
    > From: R-sig-mixed-models [mailto:r-sig-mixed-models-bounces at r-project.org]
    > On Behalf Of Luciana Motta
    > Sent: 07 November 2016 14:17
    > To: r-sig-mixed-models at r-project.org
    > Subject: [R-sig-ME] Poisson or Gaussian when modelling count data +
    > heteroscedasticity in predictor variables
    >
    > Hello,
    >
    > my name is Lucy, and I'm studying richness of aquatic insect in lakes. I
    > took samples from different habitats in each lake, for which I though of a
    > mixed model with my predictors as fixed effects, and lake/habitat as random
    > effects. I did a model using "glmer", to be able to use Poisson
    > distribution for residuals, due to my type of response variable (count data
    > -richness).
    >
    > But studying the data graphically, I suspected variance heterogeneity in 2
    > predictors.
    >
    > I continued doing model selection with glmer with Poisson distribution,
    > but also made a model using "lmer" (therefore, Gaussian distribution of
    > residuals), to be able to model variance heterogeneity of those predictors
    > and see if models fit better with it.
    >
    > Finally, yes..."lmer" model, with Gaussian distribution and varExp
    > modelling for the variance of those predictors seem much more adequate than
    > the "glmer" with Poisson (conclusion I arrived to by studying residuals,
    > fitted values, qqplot and normality tests).
    >
    > Can heteroscedasticity be a larger problem to be accounted for, than the
    > distribution of the errors for count data? I read that sometimes
    > heteroscedasticity can be masking what we think is a normality problem.
    > Also that Poisson distribution accounts for heteroscedasticity....but in my
    > case, model seems much worse. Is just that since Poisson, Neg.Binom. etc.,
    > is so recommended for count data, that I don't really know if I'm plain
    > wrong in even considering staying with Gaussian. Any suggestions/further
    > readings about this?
    >
    > Thank you very much,
    >
    > --
    > Luciana M. Motta
    > Licenciada en Cs. Biológicas FCEyN, U.B.A.
    >
    > [[alternative HTML version deleted]]
    >
    > _______________________________________________
    > R-sig-mixed-models at r-project.org mailing list
    > https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
    > The Scottish Association for Marine Science (SAMS) is registered in
    > Scotland as a Company Limited by Guarantee (SC009292) and is a registered
    > charity (9206). SAMS has two actively trading wholly owned subsidiary
    > companies: SAMS Research Services Ltd (SC224404) and SAMS Ltd (SC306912).
    > All Companies in the group are registered in Scotland and share a
    > registered office at Scottish Marine Institute, Oban Argyll PA37 1QA. The
    > content of this message may contain personal views which are not the views
    > of SAMS unless specifically stated. Please note that all email traffic is
    > monitored for purposes of security and spam filtering. As such individual
    > emails may be examined in more detail.
    >

    -- 
    Luciana M. Motta
    Licenciada en Cs. Biológicas FCEyN, U.B.A.
    CENAC (Parque Nacional Nahuel Huapi) - CONICET
    Argentina
    www.cenacbariloche.com.ar

    	[[alternative HTML version deleted]]

    _______________________________________________
    R-sig-mixed-models at r-project.org mailing list
    https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models