[R-sig-ME] Understanding and analysing split split design using lmer()

Tue Sep 17 17:55:56 CEST 2019

Dear Peter,

Thank you for your reply.

I was very concerned with the residual plots as you have advised. The
residual plots from the model [model1<- lmer(biomass ~
crop*tillage*cover+(1|rep/crop/tillage), data=data) look extremely
heteroskedastic. There is an outward coning of the residuals. I did a
square root transformation on the response and I feel it helped with the
residuals (no visual pattern of the residuals). As to combine years, just
as you mentioned, crops are rotated over the years. I did find a fine paper
in Soils and tillage research journal who have taken into account the
spatial variability from the rotation. That is what I am working on now,
and thanks for your help and guidance, learning about this is going to be
more fun and easier!

Thank you again,
Rabin

On Tue, Sep 17, 2019 at 10:24 AM Peter Claussen <dakotajudo using mac.com> wrote:

> Rabin,
>
> OK, finally getting back to this.
>
> A couple cases to consider that might be leading to a singular fit. The
> first should be, I think, the most obvious - you simply do not have enough
> samples from a random variable to obtain an estimate. If we can assume that
> your experimental replicates and whole plots sample over comparable spatial
> heterogeneity, then combining trials will give you more samples from those
> populations, thus perhaps a better estimate for whole pilot variance terms.
>
> The other is that your analysis model is incorrect for these experiments.
>
> Remember that one of the assumptions of AOV is that residual errors are
> independent and identically distributed. This assumption allows us to
> compute variances from expected mean squares. I would suggest
> https://dl.sciencesocieties.org/publications/aj/abstracts/81/4/AJ0810040665 for
> an example of how to write up the expected mean squares.
>
> We can compute variance components from mean squares, based on their
> expectations, but these usually dependent on an estimate of residual
> variance, and when residuals are not i.i.d., I think we get poor estimates
> of the other variance terms.
>
> To explore this, I considered the data you posted and fit the model
>
> model1 <-lm(biomass ~ crop*tillage*cover + rep/crop/tillage,data=data)
> par(mfrow=c(2,2))
> plot(model1)
>
> We might need to accept that the residuals are not i.i.d, i particular,
> that the errors variances are not comparable between corn and soybeans. In
> lmer, we might attempt to address this with
>
> model1.lmer <- lmer(biomass ~ crop*tillage*cover+(1|rep/crop/tillage),
> data=data)
> model2.lmer <- lmer(biomass ~ crop*tillage*cover+(crop |
> rep/crop/tillage), data=data)
> plot(model1.lmer)
> plot(model2.lmer)
>
> There appears to be some improvement, at least with soybeans.
>
> You might consider what this suggests about your experimental design and
> randomization model. The primary plots may have been exchangeable units
> when either corn or soybean were seeded, but over the course of the season,
> may no have been exchangeable with regard to cover crop - that is, errors
> in application or germination may be different when seeding into corn vs
> soybeans. Perhaps there is an interaction with tillage?
>
> Another issue you might be facing is the assumption of independence.
> Consider these:
>
> data$wholeplot <- data$crop:data$rep
> data$subplot <- data$wholeplot:data$tillage
> data$residuals <- residuals(model1)
> plot(residuals ~ rep,data=data)
> plot(residuals ~ wholeplot,data=data)
>
> There may be a spatial component that introduces correlations among the
> residuals, making the estimate of residual unreliable. You would need to
> consider your trial map. It might be possible to model a fixed effect
> spatial trend and continue with lmer; capturing residual correlations in
> two dimensions (i.e. AR1 x AR1), as far as I know, is only possible in R
> using AS-REML.
>
>
> You most certainly will want to combine experiments, but for this post I’m
> limiting myself to a simpler discussion of computational options for mixed
> models in R.
>
> FWIW,  I presented some options for mixed model analysis of agricultural
> trials at one of the ASA/CSSA/SSSA meetings - see
> https://gdmdata.com/media/documents/handouts/2016ASA_RMixedModelsPresentation.pdf .
> I also gave a presentation on our current approach to spatial modeling -
> the presentation (
> https://gdmdata.com/media/documents/handouts/2017ASA_BeyondRCBD_Spatial.pdf)
> doesn’t include R code but there is supplementary material with literate
> documentation at
> https://gdmdata.com/media/documents/handouts/2017ASA_BeyondRCBD.zip . I
> left autocorrelated errors out of this presentation because I was unable to
> find an acceptable solution in lme.
>
> Cheers
>
>
>
>
> On Sep 9, 2019, at 6:05 PM, Rabin KC <kc000001 using umn.edu> wrote:
>
> Dear Peter,
>
> Your reply has made my day and very clearly answered the questions I was
> asking myself for a few weeks. Thank you for correcting my mistakes as
> well.
>
> The lmer() does produce a variance of 0 for random effect (crop: rep). I
> now know the term is called boundary (singular) fit. What would be the
> correct thing to do in this case? I do have this experiment conducted for 2
> years in 2 different locations, which might take care of the precious
> degrees of freedom, but again, will add new variables making the model more
> complicated.
>
> I ran an Anova II (wald test) and was thinking of doing post hoc analysis
> on significant terms. (If this would take care of singularity). Any advice
> on that would be greatly appreciated.
>
> And yes, I am in Agronomy in U of M. I will be writing to you in your work
> email about the friendly statistician you mentioned!!!!
>
>
> Best regards,
> Rabin
>
> On Mon, Sep 9, 2019 at 4:43 PM Peter Claussen <dakotajudo using mac.com> wrote:
>
>>
>>
>> On Sep 9, 2019, at 1:35 PM, Rabin KC <kc000001 using umn.edu> wrote:
>>
>> Hello R mixed models community,
>>
>> …
>>
>>
>>
>> model1<aov(cover_coverage~rep+crop*tillage*cover+Error(rep/crop/tillage),data=data)
>>
>> For my  analysis using lmer, and lme,  I have the following code:
>>
>> model2<-lmer(biomass~crop*tillage*cover+ (1|rep),data=data)
>>
>> model3<-lme(fixed=biomass~crop*tillage*cover, data=data, random=~1|rep)
>>
>> Now my confusion and questions are:
>> 1. I have used crop, tillage, and cover as fixed effects. And only the rep
>> (block) as random. Is the random effect properly assigned for this design?
>>
>>
>> Simply, no.
>>
>> When you have three levels of independent randomization, then your design
>> model has three random effects
>>
>> Consider the expected output from agricolae. There should be an error
>> term for crop, as randomized within whole blocks, designated Ea. This is
>> the crop x replicate interaction term, and would also be the residual error
>> if you were to analyze these data as a simple RCB of 2 treatments and 4
>> replicates (averaging biomass over whole plots). The F-test you would use
>> to test the significance of crop would be MS(crop) / Ea, while the F-test
>> for replicate effects would be MS(rep)/Ea.
>>
>> There will be a second error strata, representing the independent
>> randomization of tillage within whole plots (replicate:crop). That should
>> be labelled Eb. This would also be the residual error of a simple
>> split-plot experiment with crop as whole plot and tillage as subplot
>> treatments (averaging biomass over subplots) and would be represented as
>> rep:crop:tillage. This is the error term to test tillage and crop-tillage
>> interaction.
>>
>> The final error strata is the independent randomization of cover within
>> crop:tillage subplots, and this will work out to be equivalent to residual
>> error. So, to duplicate the decomposition of the aov from agricolae, we
>> might write the linear model as
>>
>> lm(biomass ~ crop*tillage*cover + rep +  rep:crop + rep:crop:tillage,
>> data=data)
>> or, more briefly, as
>> lm(biomass ~ crop*tillage*cover + rep/crop/tillage,data=data)
>>
>> That won’t give the correct F-tests, though, so to get aov to perform
>> appropriate tests, we should write
>>
>> aov(biomass ~ crop*tillage*cover + rep + Error(rep:crop/tillage),
>> data=data)
>>
>> (not what you’ve written, since we test rep against rep:crop)
>>
>> This leads to an lmer model of
>> lmer(biomass ~ crop*tillage*cover+(1|rep/crop/tillage), data=data)
>>
>>
>> 2. I have read a lot about crossed and nested design. Crawley and Oehlert
>> give particular examples about it, but I have trouble understanding if
>> this
>> design has nested factors or crossed factors.
>>
>>
>> crop, tillage and cover are all crossed factors, since each possible
>> combination of crop, tillage and cover are included, and you might be
>> interested in the interactions among these factors. Since this is a
>> split-split-plot experiment, the experimental units (rep:crop = whole plot,
>> rep:crop:tillage=subplot, rep:crop:tillage:cover = sub subplot) are nested,
>> and not all combinations of treatments are subject to the same experimental
>> errors.
>>
>> That’s where the analysis starts to become tricky. If you want to compare
>> treatments that are all applied within same random effects, that is,
>>
>> Crop A : Tillage A : Cover A vs Crop A : Tillage A : Cover B,
>>
>> then the error terms for mean comparisons are simple. Comparisons across
>> strata, like
>>
>> Crop A : Tillage A : Cover A vs Crop B : Tillage A : Cover A
>>
>> require estimates of the variances for each level of error, and with only
>> a few observations (Ea (crop:rep) should have only 3 d.f.) you might not
>> get estimates from lmer, and error terms derived from aov might include
>> negative variances. AOV of the linear model (lm) is useful here, in that if
>> any of the F-ratios of the random effects (rep, rep:crop or
>> rep:crop:tillage) are less than 1, then you should expect lmer to produce a
>> variance estimate of effectively 0. I would need to write out the expected
>> means squares for a split-split-plot, though, to justify this statement.
>>
>> By your email address and the subject matter, I’m guessing you’re in
>> agronomy at the U of M. If so, I can suggest a friendly neighborhood
>> statistician to consult.
>>
>> Cheers,
>>
>> Peter Claussen
>> (work email Peter using gdmdata.com)
>>
>>
>

	[[alternative HTML version deleted]]