[R-sig-ME] glmer: order of variables drastically influences SE
Guillaume Adeux
gu|||@ume@|mon@@2 @end|ng |rom gm@||@com
Tue Aug 6 09:33:18 CEST 2019
Hi Ben,
Thanks for your answer and the link.
I posted the data on figshare and linked it to my comment on the GitHub
issue.
Don't hesitate if you need more information.
Sincerely,
Guillaume ADEUX
Le lun. 5 août 2019 à 21:26, Ben Bolker <bbolker using gmail.com> a écrit :
>
> Similar problems have come up before:
>
> https://github.com/lme4/lme4/issues/449
>
> It would be great to have data to look at (best of all would be
> posting it somewhere and linking to it as a comment in the issue/URL
> listed above ...
>
> On 2019-08-05 4:57 a.m., Guillaume Adeux wrote:
> > Hello everyone,
> >
> > Could anyone explain to me how the order of variables in the glmer
> function
> > could affect the standard error of the estimates? The difference is
> drastic
> > in my case (see the two model outputs at the bottom of the email)...
> > The data I used comes from a 3 way factorial experiment (tillage,
> > N=nitrogen, CC=cover crop species) on which an additional covariate was
> > measured (dry_bio_cover_m2).
> > My main objective is to identify the drivers of weed biomass in different
> > cover crop.
> > I came across the situation doing the following:
> >
> > *1. fitting full model*
> >
> mod_full_CC=glmer(dry_bio_weeds_m2+0.001~block+year+scale(dry_bio_cover_m2)*tillage*N*CC+(1|block:tillage)+(1|block:tillage:N)+(1|block:tillage:N:CC)+(1|block:year)+(1|block:year:tillage)+(1|block:year:tillage:N)+(1|block:year:tillage:N:CC),family=gaussian(link="sqrt"),control=glmerControl(optimizer="nloptwrap",optCtrl=list(algorithm="NLOPT_LN_NELDERMEAD")),data=biomassCC_wo_C)
> >
> > *2. identifying the most parcimonious model*
> > dred_CC=dredge(mod_full_CC,rank="AICc",fixed=c("block","year"))
> >
> > The most parcimonious model resulting from dredge (i.e.
> get.models(dred_CC,
> > 1)[[1]] ) was:
> > dry_bio_weeds_m2 + 0.001 ~ *CC + N + scale(dry_bio_cover_m2) + tillage +
> > block + year *+
> > (1 | block:tillage) + (1 | block:tillage:N) + (1 |
> block:tillage:N:CC)
> > + (1 | block:year) + (1 | block:year:tillage) + (1 |
> block:year:tillage:N)
> > + (1 | block:year:tillage:N:CC) +
> > * CC:N + CC:scale(dry_bio_cover_m2) + CC:tillage +
> > scale(dry_bio_cover_m2):tillage*
> >
> > If I refit this model in a different (more logical) order with the same
> > variables, as followed,
> > dry_bio_weeds_m2 + 0.001 ~* block + year + scale(dry_bio_cover_m2) +
> > tillage + N + CC + CC:N + CC:tillage + CC:scale(dry_bio_cover_m2) +
> > tillage:scale(dry_bio_cover_m2)* +
> > (1 | block:tillage) + (1 | block:tillage:N) + (1 | block:tillage:N:CC) +
> (1
> > | block:year) + (1 | block:year:tillage) + (1 | block:year:tillage:N) +
> (1
> > | block:year:tillage:N:CC)
> > I obtain similar estimates but drastically different SE, which of course
> > impacts post-hoc analysis.
> >
> >
> > The condition number of both models is also slightly different (12.1 for
> > the model with the variables in the order returned by dredge and 14.3 for
> > the model with the "logical" order), although no severe collinearity was
> > detected.
> >
> >
> >
> > Model with variables fitted in the order given by dredge:
> >
> > Generalized linear mixed model fit by maximum likelihood (Laplace
> > Approximation) ['glmerMod']
> > Family: gaussian ( sqrt )
> > Formula: dry_bio_weeds_m2 + 0.001 ~ CC + N + scale(dry_bio_cover_m2) +
> > tillage + block + year + (1 | block:tillage) + (1 |
> > block:tillage:N) +
> > (1 | block:tillage:N:CC) + (1 | block:year) + (1 |
> > block:year:tillage) + (1 | block:year:tillage:N) + (1 |
> > block:year:tillage:N:CC) +
> > CC:N + CC:scale(dry_bio_cover_m2) + CC:tillage +
> > scale(dry_bio_cover_m2):tillage
> > Data: biomassCC_wo_C
> > Control: glmerControl(optimizer = "nloptwrap", optCtrl =
> > list(algorithm = "NLOPT_LN_NELDERMEAD"))
> >
> > AIC BIC logLik deviance df.resid
> > 6294.5 6416.4 -3116.2 6232.5 347
> >
> > Scaled residuals:
> > Min 1Q Median 3Q Max
> > -4.7837 -0.5006 -0.0011 0.5075 4.7593
> >
> > Random effects:
> > Groups Name Variance Std.Dev.
> > block:year:tillage:N:CC (Intercept) 3.061e+04 1.750e+02
> > block:tillage:N:CC (Intercept) 2.162e-05 4.650e-03
> > block:year:tillage:N (Intercept) 2.645e-05 5.143e-03
> > block:tillage:N (Intercept) 2.125e+01 4.610e+00
> > block:year:tillage (Intercept) 4.656e-03 6.823e-02
> > block:year (Intercept) 3.848e+02 1.962e+01
> > block:tillage (Intercept) 4.204e-04 2.050e-02
> > Residual 4.505e+03 6.712e+01
> > Number of obs: 378, groups:
> > block:year:tillage:N:CC, 192; block:tillage:N:CC, 96;
> > block:year:tillage:N, 64; block:tillage:N, 32; block:year:tillage, 16;
> > block:year, 8; block:tillage, 8
> >
> > Fixed effects:
> > Estimate Std. Error t value Pr(>|z|)
> > (Intercept) 13.829033 0.001353 10219.154 <
> 2e-16 ***
> > CCVv -0.064097 0.001353 -47.367 <
> 2e-16 ***
> > CCTs 0.059189 0.001353 43.739 <
> 2e-16 ***
> > N.L 5.506568 0.001310 4202.646 <
> 2e-16 ***
> > N.Q -0.387064 0.001310 -295.407 <
> 2e-16 ***
> > N.C -0.123713 0.001310 -94.417 <
> 2e-16 ***
> > scale(dry_bio_cover_m2) -3.096732 0.001309 -2366.243 <
> 2e-16 ***
> > tillageRT -0.034498 0.001310 -26.329 <
> 2e-16 ***
> > blockR2 -1.369223 0.464429 -2.948
> 0.0032 **
> > blockR3 -0.897212 0.001353 -663.006 <
> 2e-16 ***
> > blockR4 0.536253 0.464366 1.155 0.2482
> > year2014 0.484093 0.350696 1.380 0.1675
> > CCVv:N.L -3.795768 0.652249 -5.820
> 5.90e-09 ***
> > CCTs:N.L -3.760854 0.652139 -5.767
> 8.07e-09 ***
> > CCVv:N.Q -0.630917 0.652215 -0.967 0.3334
> > CCTs:N.Q -0.258428 0.652139 -0.396 0.6919
> > CCVv:N.C 0.857098 0.652113 1.314 0.1887
> > CCTs:N.C 0.273846 0.652088 0.420 0.6745
> > CCVv:scale(dry_bio_cover_m2) -2.834524 0.001310 -2163.350 <
> 2e-16 ***
> > CCTs:scale(dry_bio_cover_m2) -2.564852 0.001401 -1831.105 <
> 2e-16 ***
> > CCVv:tillageRT 4.446872 0.001401 3174.709 <
> 2e-16 ***
> > CCTs:tillageRT 2.758362 0.001401 1969.252 <
> 2e-16 ***
> > scale(dry_bio_cover_m2):tillageRT 2.022118 0.001310 1543.318 <
> 2e-16 ***
> >
> >
> >
> >
> >
> >
> >
> >
> > Model with variables fitted in "logical" order:
> >
> > Generalized linear mixed model fit by maximum likelihood (Laplace
> > Approximation) ['glmerMod']
> > Family: gaussian ( sqrt )
> > Formula: dry_bio_weeds_m2 + 0.001 ~ block + year +
> scale(dry_bio_cover_m2) +
> > tillage + N + CC + CC:N + CC:tillage + CC:scale(dry_bio_cover_m2)
> > + tillage:scale(dry_bio_cover_m2) + (1 | block:tillage) + (1 |
> > block:tillage:N) + (1 | block:tillage:N:CC) + (1 | block:year) +
> > (1 | block:year:tillage) + (1 | block:year:tillage:N) + (1 |
> > block:year:tillage:N:CC)
> > Data: biomassCC_wo_C
> > Control: glmerControl(optimizer = "nloptwrap", optCtrl =
> > list(algorithm = "NLOPT_LN_NELDERMEAD"))
> >
> > AIC BIC logLik deviance df.resid
> > 6294.5 6416.4 -3116.2 6232.5 347
> >
> > Scaled residuals:
> > Min 1Q Median 3Q Max
> > -4.7838 -0.5009 -0.0011 0.5075 4.7594
> >
> > Random effects:
> > Groups Name Variance Std.Dev.
> > block:year:tillage:N:CC (Intercept) 3.061e+04 1.749e+02
> > block:tillage:N:CC (Intercept) 2.141e-05 4.627e-03
> > block:year:tillage:N (Intercept) 2.605e-05 5.104e-03
> > block:tillage:N (Intercept) 2.106e+01 4.589e+00
> > block:year:tillage (Intercept) 4.487e-03 6.699e-02
> > block:year (Intercept) 3.822e+02 1.955e+01
> > block:tillage (Intercept) 4.228e-04 2.056e-02
> > Residual 4.504e+03 6.711e+01
> > Number of obs: 378, groups:
> > block:year:tillage:N:CC, 192; block:tillage:N:CC, 96;
> > block:year:tillage:N, 64; block:tillage:N, 32; block:year:tillage, 16;
> > block:year, 8; block:tillage, 8
> >
> > Fixed effects:
> > Estimate Std. Error t value Pr(>|z|)
> > (Intercept) 13.82995 42.86352 0.323 0.74696
> > blockR2 -1.36672 40.78008 -0.034 0.97326
> > blockR3 -0.89843 40.78048 -0.022 0.98242
> > blockR4 0.53355 40.77879 0.013 0.98956
> > year2014 0.48290 28.79334 0.017 0.98662
> > scale(dry_bio_cover_m2) -3.09673 0.81313 -3.808 0.00014
> ***
> > tillageRT -0.03537 43.77607 -0.001 0.99936
> > N.L 5.50829 43.77552 0.126 0.89987
> > N.Q -0.38899 43.77300 -0.009 0.99291
> > N.C -0.12513 43.77337 -0.003 0.99772
> > CCVv -0.06447 43.74925 -0.001 0.99882
> > CCTs 0.05444 43.75387 0.001 0.99901
> > N.L:CCVv -3.79754 61.86354 -0.061 0.95105
> > N.Q:CCVv -0.62820 61.86055 -0.010 0.99190
> > N.C:CCVv 0.86704 61.85967 0.014 0.98882
> > N.L:CCTs -3.76680 61.86056 -0.061 0.95145
> > N.Q:CCTs -0.25653 61.85857 -0.004 0.99669
> > N.C:CCTs 0.27781 61.85861 0.004 0.99642
> > tillageRT:CCVv 4.44671 61.86316 0.072 0.94270
> > tillageRT:CCTs 2.75968 61.86792 0.045 0.96442
> > scale(dry_bio_cover_m2):CCVv -2.83780 0.93223 -3.044 0.00233 **
> > scale(dry_bio_cover_m2):CCTs -2.56887 1.32152 -1.944 0.05191 .
> > scale(dry_bio_cover_m2):tillageRT 2.02139 0.85983 2.351 0.01873 *
> >
> >
> > I can share the data if need be. Thank you very much for your help.
> >
> > Sincerely,
> >
> > Guillaume ADEUX
> >
> > [[alternative HTML version deleted]]
> >
> > _______________________________________________
> > R-sig-mixed-models using r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> >
>
> _______________________________________________
> R-sig-mixed-models using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
[[alternative HTML version deleted]]
More information about the R-sig-mixed-models
mailing list