[R] Independent variable dependent on offset in GLMM

Tue Nov 26 16:16:02 CET 2013

Jonas Josefsson <jonas.josefsson <at> slu.se> writes:

> 
> Hi!

  (I was initially going to say that this question would probably be
better on r-sig-mixed-models at r-project.org, but now that I've been
through it I've changed my mind -- there aren't really any issues here
that are specific to mixed models ... it's really mostly a
*statistical* question rather than an R question, and as such might
belong on a statistics forum such as http://stats.stackexchange.com ...)

> I'm running glmer (lme4) models with biodiversity data and I'm
> having trouble with understanding/finding information on how the
> offset() option is implemented.

> Explicitly, I'm wondering if the offset is only implemented on the
> dependent variable (as I think it is), or does it also affect
> independent variables in the model (was told this by a stat guy at
> our department)?

  I'm not perfectly sure I understand your question, but as I understand
it you are right and the stat guy in your department is wrong (but
perhaps you misunderstood them?). The offset term is added to the linear
predictor of the model.

> My data is inventories of birds (species richness and abundance) at
> the scale of whole farms. Thus, each observation has a different
> inventory area which I am accounting for in the model as: offset =
> log(INVAREA).

  It makes quite a bit of sense to model abundance as directly
proportional to area (i.e., you are in effect modeling density rather
than total counts, but accounting for changes in Poisson sampling
variance appropriately).  I'm not so sure it makes sense to 
model species richness as directly proportional to area.  You might
want to consider adding log(area) as a covariate rather than as
an offset, which is then essentially assuming a power-law relationship
between area and species richness (log(richness) = beta_a*log(area)
-> richness = area^beta).

> However, as a fixed effect in the model I've got the number of
> different crop types in the inventoried area.  As this variable is
> also affected by inventoried area, I would like to account for this
> in some way, but I find it difficult to know the best way to do so.

> Right now, I have made a linear quadratic function (using lm) of
> crop number ~ inventoried area + inventoried area^2 to describe how
> crop number increases with increasing sample size (area). Then, I
> have subtracted fitted values from observed number of crops and used
> this measure in the models. Is this a reasonable work around?

  This doesn't make very much sense to me, but it will depend
on your general model of what's going on. I would have guessed that
abundance (for example) would depend on the number of crop types
available, not on whether the number of crop types was higher than
expected for a sample of a given area.  I suppose it's possible, though.