[R] lmer - error message
Joshua Wiley
jwiley.psych at gmail.com
Sat Feb 18 03:38:45 CET 2012
Hi Sean,
Thanks for the data. Your first problem is that Age is a factor with
14 levels. Second, Year is a factor with 16 levels. Now consider
what interactions of these will be---factors with a huge number of
levels, exceeding what is reasonable to fit to your size of data. I
included inline code below. First I convert these variables to
numeric. I also do some plots.
While I get a model to run, the estimated variance of the random
intercept is 0, which is a bit troublesome. I looked at the data, but
I do not have a particularly good reason, other than presumably there
is very little variability by fish after with age and lake in the
model. Anyway, I go on to try a log transformation of Age because of
its relationship with Increment. That seems to work pretty well, but
the residuals are clearly heteroscedastic, so I also show how you can
use the sandwich package to (at least partially) address this.
Cheers,
Josh
require(lme4)
## shows that there are 224 levels in the Age by Year interaction!!
with(odata1, interaction(Age, Year))
## create numeric variables instead
odata1 <- within(odata1, {
numAge <- as.numeric(levels(Age))[Age]
numYear <- as.numeric(levels(Year))[Year]
})
## look at a density plot of Increment
plot(density(odata1$Increment))
rug(odata1$Increment)
## look at Increment versus Age (numeric)
## looks like a straight line is likely a poor fit
xyplot(Increment ~ numAge, data = odata1)
## compare by lake (looks pretty similar)
xyplot(Increment ~ numAge | lake, data = odata1)
## also note that it looks like there is more variability when age is
log than hight
## we'll keep that in mind for later
## simple model
m1 <- lmer(Increment ~ 0 + numAge*lake + (1 | FishID),
data = odata1)
## check residuals against normal
qqnorm(resid(m1))
## residuals against age
## this looks pretty bad
plot(odata1$numAge, resid(m1))
## what about the random effects?
plot(ranef(m1))
## checking the model summary
## we see that the variance for the random intercepts is 0
## this is also rather concerning
summary(m1)
## lets look at some descriptives
descriptives <- with(odata1, do.call("rbind", tapply(Increment,
FishID, function(x) {
c(M = mean(x, na.rm = TRUE), V = var(x, na.rm = TRUE), N = length(x))
})))
descriptives <- descriptives[order(descriptives[, "M"]), ]
descriptives # print
## looks like there are quite a few fish with only one observations
## also, those with only one tend to have a higher mean increment
## given the shape of the relationship, we might try a log transformation on Age
## also given that the random intercepts have 0 variance, we can drop it
malt <- lm(Increment ~ 1 + log(numAge)*lake, data = odata1)
qqnorm(resid(malt)) ## looks decent enough
## not too bad, but clearly not homoscedastic residuals
plot(odata1$numAge, resid(malt))
## the residuals appear heteroscedastic, so we could
## use a sandwhich estimator of the covariance matrix
## the general form is: inv(X'X) X' W X inv(X'X)
## the correction comes in terms of the weight matrix, W which
## can be defined in various ways
## load two required packages
require(lmtest)
require(sandwich)
## the default
coef(summary(malt))
## using sandwhich estimator
## (okay, the results come out pretty similar, not parameter estimates
will be unchaged
## only SEs change)
coeftest(malt, vcov. = vcovHC(malt, type = "HC", sandwich = TRUE))
On Thu, Feb 16, 2012 at 11:44 PM, Sean Godwin <sean.godwin at gmail.com> wrote:
> Hi Josh,
>
> Thank you so much for your response.
>
> The point of the process was actually to find out whether there are
> different age effects for each lake, so an interaction term between age and
> lake is a necessary one. Taking out the other random effects to make the
> model simpler would work perfectly to make this easier to tackle though, and
> I can add the other terms on later as you say!
>
> So: m1 <- lmer(Increment ~ 0 + Age + Age*lake + (1|FishID),lakedata)
>
> But apparently I don't understand the Age*lake syntax... all I'm trying to
> do with this is to have an interaction term between age and lake, but since
> neither are random effects, I can't just write (1|Age:lake) as I could with
> (1|Age:Year), right? It is the Age*lake term that is causing the error...
> when I take it out, the model runs fine.
>
> There are 117 fish observations, so 117 unique values of fishID. The number
> of observations per fish changes depending on how old the fish is ( a2 year
> old fish = 2 observations), with a maximum fish age of 13 years (so max 13
> observations per fish and 13 years total in the dataset).
>
> I've attached my data (Josh_fulldata.csv) and the script used to rearrange
> it into a usable format (Josh_model.R). For some reason, I wasn't able to
> export the resulting dataframe without causing errors when re-inputting it,
> so hopefully this works for you. Hopefully the inclusion of the full
> dataset answers your other questions. Interestingly, when I used a 50 fish
> subset, the error message didn't occur (I've attached that just in case:
> Josh_data.csv).
>
> I can't thank you enough for your help. Already you've pointed me in a
> better direction. All I'm trying to do is to include a non-random
> interaction in here (age:lake), and it sure has me stumped!
>
> -Sean
>
> On 16 February 2012 23:06, Joshua Wiley <jwiley.psych at gmail.com> wrote:
>>
>> Hi Sean,
>>
>> Can you fit a simpler model? Particularly I would start with just one
>> random effect (say, (1|fishid)). I would not expect the age*lake
>> interaction to be a problem per se.
>>
>> Also you may not be fitting the model you think you are. Writing age*lake
>> in a formula expands to:
>>
>> Age + lake + age:lake
>>
>> That is the two main effects of age and lake plus their interaction. That
>> makes your specification of the main effect of age redundant and means you
>> will have a main effect of lake, which since you suppressed the intercept by
>> adding the 0, will be the conditional expected values of the two lakes when
>> age is 0 for the same random effects.
>>
>> In terms of getting around the error, some other information about your
>> data would be helpful. How many observations do you have? How many unique
>> values of fishid are there? How many observations per fish? How many
>> years? Is it a numeric variable or factor? What about age? For factor
>> variables what do the joint distributions look like? Do you really mean to
>> have a random constant by year AND age:year? Within a fish, do age and fish
>> have a perfect correlation?
>>
>> In terms of practical advice, I would start with simple models and make
>> them more complex. Try to find the term that when you add it leads to the
>> error. Start with an unconditional model only if you have to. As you
>> estimate additional parameters in your model, what happens to the others?
>> Are they stable? Do they change a lot? What about the standard errors?
>> Does adding one term the model still fit but the SEs become huge?
>>
>> If you need more help, posting a sample of your data that reproduces the
>> error so we can run it on our machines would help (a plaintext file or the
>> output of dput() would work well...we do not need all of it, just enough
>> that a model could work (say 50-100 observations) and that reproduces the
>> same error.
>>
>> Cheers,
>>
>> Josh
>>
>> On Feb 16, 2012, at 21:39, Sean Godwin <sean.godwin at gmail.com> wrote:
>>
>> > Hi all,
>> >
>> > I am fairly new to mixed effects models and lmer, so bear with me.
>> >
>> > Here is a subset of my data, which includes a binary variable (lake (TOM
>> > or
>> > JAN)), one other fixed factor (Age) and a random factor (Year).
>> > lake FishID Age Increment Year
>> > 1 TOM 1 1 0.304 2007
>> > 2 TOM 1 2 0.148 2008
>> > 3 TOM 1 3 0.119 2009
>> > 4 TOM 1 4 0.053 2010
>> > 5 JAN 2 1 0.352 2009
>> > 6 JAN 2 2 0.118 2010
>> >
>> > The model I'm trying to fit is:
>> > m1 <- lmer(Increment ~ 0 + Age + Age*lake + (1|Year) + (1|Year:Age) +
>> > (1|FishID),lakedata)
>> >
>> > The error message I get is: *"Error in mer_finalize(ans) : Downdated X'X
>> > is
>> > not positive definite, 27."*
>> > *
>> > *
>> >> From reading up on the subject, I think my problem is that I can't
>> > incorporate the 'lake' variable in a fixed-effect interaction because it
>> > is
>> > only has one binary observation. But I don't know what to do to be able
>> > to
>> > fit this model. Any help would be greatly appreciated!
>> > -Sean
>> >
>> > [[alternative HTML version deleted]]
>> >
>> > ______________________________________________
>> > R-help at r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> > http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>
>
--
Joshua Wiley
Ph.D. Student, Health Psychology
Programmer Analyst II, Statistical Consulting Group
University of California, Los Angeles
https://joshuawiley.com/
More information about the R-help
mailing list