[R] lmer - error message

Sat Feb 18 03:38:45 CET 2012

Hi Sean,

Thanks for the data.  Your first problem is that Age is a factor with
14 levels.  Second, Year is a factor with 16 levels.  Now consider
what interactions of these will be---factors with a huge number of
levels, exceeding what is reasonable to fit to your size of data.  I
included inline code below.  First I convert these variables to
numeric.  I also do some plots.

While I get a model to run, the estimated variance of the random
intercept is 0, which is a bit troublesome.  I looked at the data, but
I do not have a particularly good reason, other than presumably there
is very little variability by fish after with age and lake in the
model.  Anyway, I go on to try a log transformation of Age because of
its relationship with Increment.  That seems to work pretty well, but
the residuals are clearly heteroscedastic, so I also show how you can
use the sandwich package to (at least partially) address this.

Cheers,

Josh

require(lme4)

## shows that there are 224 levels in the Age by Year interaction!!
with(odata1, interaction(Age, Year))
## create numeric variables instead
odata1 <- within(odata1, {
  numAge <- as.numeric(levels(Age))[Age]
  numYear <- as.numeric(levels(Year))[Year]
})

## look at a density plot of Increment
plot(density(odata1$Increment))
rug(odata1$Increment)

## look at Increment versus Age (numeric)
## looks like a straight line is likely a poor fit
xyplot(Increment ~ numAge, data = odata1)

## compare by lake (looks pretty similar)
xyplot(Increment ~ numAge | lake, data = odata1)
## also note that it looks like there is more variability when age is
log than hight
## we'll keep that in mind for later

## simple model
m1 <- lmer(Increment ~  0 + numAge*lake + (1 | FishID),
  data = odata1)

## check residuals against normal
qqnorm(resid(m1))
## residuals against age
## this looks pretty bad
plot(odata1$numAge, resid(m1))
## what about the random effects?
plot(ranef(m1))

## checking the model summary
## we see that the variance for the random intercepts is 0
## this is also rather concerning
summary(m1)

## lets look at some descriptives
descriptives <- with(odata1, do.call("rbind", tapply(Increment,
FishID, function(x) {
  c(M = mean(x, na.rm = TRUE), V = var(x, na.rm = TRUE), N = length(x))
})))
descriptives <- descriptives[order(descriptives[, "M"]), ]
descriptives # print

## looks like there are quite a few fish with only one observations
## also, those with only one tend to have a higher mean increment

## given the shape of the relationship, we might try a log transformation on Age
## also given that the random intercepts have 0 variance, we can drop it
malt <- lm(Increment ~ 1 + log(numAge)*lake, data = odata1)
qqnorm(resid(malt)) ## looks decent enough

## not too bad, but clearly not homoscedastic residuals
plot(odata1$numAge, resid(malt))

## the residuals appear heteroscedastic, so we could
## use a sandwhich estimator of the covariance matrix
## the general form is: inv(X'X) X' W X inv(X'X)
## the correction comes in terms of the weight matrix, W which
## can be defined in various ways

## load two required packages
require(lmtest)
require(sandwich)

## the default
coef(summary(malt))
## using sandwhich estimator
## (okay, the results come out pretty similar, not parameter estimates
will be unchaged
## only SEs change)
coeftest(malt, vcov. = vcovHC(malt, type = "HC", sandwich = TRUE))

On Thu, Feb 16, 2012 at 11:44 PM, Sean Godwin <sean.godwin at gmail.com> wrote:
> Hi Josh,
>
> Thank you so much for your response.
>
> The point of the process was actually to find out whether there are
> different age effects for each lake, so an interaction term between age and
> lake is a necessary one.  Taking out the other random effects to make the
> model simpler would work perfectly to make this easier to tackle though, and
> I can add the other terms on later as you say!
>
> So:  m1 <- lmer(Increment ~ 0 + Age + Age*lake + (1|FishID),lakedata)
>
> But apparently I don't understand the Age*lake syntax... all I'm trying to
> do with this is to have an interaction term between age and lake, but since
> neither are random effects, I can't just write (1|Age:lake) as I could with
> (1|Age:Year), right?  It is the Age*lake term that is causing the error...
> when I take it out, the model runs fine.
>
> There are 117 fish observations, so 117 unique values of fishID.  The number
> of observations per fish changes depending on how old the fish is ( a2 year
> old fish = 2 observations), with a maximum fish age of 13 years (so max 13
> observations per fish and 13 years total in the dataset).
>
> I've attached my data (Josh_fulldata.csv) and the script used to rearrange
> it into a usable format (Josh_model.R).  For some reason, I wasn't able to
> export the resulting dataframe without causing errors when re-inputting it,
> so hopefully this works for you.  Hopefully the inclusion of the full
> dataset answers your other questions.  Interestingly, when I used a 50 fish
> subset, the error message didn't occur (I've attached that just in case:
> Josh_data.csv).
>
> I can't thank you enough for your help.  Already you've pointed me in a
> better direction.  All I'm trying to do is to include a non-random
> interaction in here (age:lake), and it sure has me stumped!
>
> -Sean
>
> On 16 February 2012 23:06, Joshua Wiley <jwiley.psych at gmail.com> wrote:
>>
>> Hi Sean,
>>
>> Can you fit a simpler model?  Particularly I would start with just one
>> random effect (say, (1|fishid)).  I would not expect the age*lake
>> interaction to be a problem per se.
>>
>> Also you may not be fitting the model you think you are.  Writing age*lake
>> in a formula expands to:
>>
>> Age + lake + age:lake
>>
>> That is the two main effects of age and lake plus their interaction.  That
>> makes your specification of the main effect of age redundant and means you
>> will have a main effect of lake, which since you suppressed the intercept by
>> adding the 0, will be the conditional expected values of the two lakes when
>> age is 0 for the same random effects.
>>
>> In terms of getting around the error, some other information about your
>> data would be helpful.  How many observations do you have?  How many unique
>> values of fishid are there?  How many observations per fish?  How many
>> years?  Is it a numeric variable or factor?  What about age?  For factor
>> variables what do the joint distributions look like?  Do you really mean to
>> have a random constant by year AND age:year?  Within a fish, do age and fish
>> have a perfect correlation?
>>
>> In terms of practical advice, I would start with simple models and make
>> them more complex.  Try to find the term that when you add it leads to the
>> error.  Start with an unconditional model only if you have to.  As you
>> estimate additional parameters in your model, what happens to the others?
>>  Are they stable?  Do they change a lot? What about the standard errors?
>>  Does adding one term the model still fit but the SEs become huge?
>>
>> If you need more help, posting a sample of your data that reproduces the
>> error so we can run it on our machines would help (a plaintext file or the
>> output of dput() would work well...we do not need all of it, just enough
>> that a model could work (say 50-100 observations) and that reproduces the
>> same error.
>>
>> Cheers,
>>
>> Josh
>>
>> On Feb 16, 2012, at 21:39, Sean Godwin <sean.godwin at gmail.com> wrote:
>>
>> > Hi all,
>> >
>> > I am fairly new to mixed effects models and lmer, so bear with me.
>> >
>> > Here is a subset of my data, which includes a binary variable (lake (TOM
>> > or
>> > JAN)), one other fixed factor (Age) and a random factor (Year).
>> >  lake FishID Age Increment Year
>> > 1  TOM      1   1     0.304     2007
>> > 2  TOM      1   2     0.148     2008
>> > 3  TOM      1   3     0.119     2009
>> > 4  TOM      1   4     0.053     2010
>> > 5  JAN       2   1     0.352     2009
>> > 6  JAN       2   2     0.118     2010
>> >
>> > The model I'm trying to fit is:
>> > m1 <- lmer(Increment ~ 0 + Age + Age*lake + (1|Year) + (1|Year:Age) +
>> > (1|FishID),lakedata)
>> >
>> > The error message I get is: *"Error in mer_finalize(ans) : Downdated X'X
>> > is
>> > not positive definite, 27."*
>> > *
>> > *
>> >> From reading up on the subject, I think my problem is that I can't
>> > incorporate the 'lake' variable in a fixed-effect interaction because it
>> > is
>> > only has one binary observation.  But I don't know what to do to be able
>> > to
>> > fit this model.  Any help would be greatly appreciated!
>> > -Sean
>> >
>> >    [[alternative HTML version deleted]]
>> >
>> > ______________________________________________
>> > R-help at r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> > http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>
>

-- 
Joshua Wiley
Ph.D. Student, Health Psychology
Programmer Analyst II, Statistical Consulting Group
University of California, Los Angeles
https://joshuawiley.com/