[R-sig-ME] glmer: order of variables drastically influences SE
Guillaume Adeux
gu|||@ume@|mon@@2 @end|ng |rom gm@||@com
Mon Aug 5 10:57:06 CEST 2019
Hello everyone,
Could anyone explain to me how the order of variables in the glmer function
could affect the standard error of the estimates? The difference is drastic
in my case (see the two model outputs at the bottom of the email)...
The data I used comes from a 3 way factorial experiment (tillage,
N=nitrogen, CC=cover crop species) on which an additional covariate was
measured (dry_bio_cover_m2).
My main objective is to identify the drivers of weed biomass in different
cover crop.
I came across the situation doing the following:
*1. fitting full model*
mod_full_CC=glmer(dry_bio_weeds_m2+0.001~block+year+scale(dry_bio_cover_m2)*tillage*N*CC+(1|block:tillage)+(1|block:tillage:N)+(1|block:tillage:N:CC)+(1|block:year)+(1|block:year:tillage)+(1|block:year:tillage:N)+(1|block:year:tillage:N:CC),family=gaussian(link="sqrt"),control=glmerControl(optimizer="nloptwrap",optCtrl=list(algorithm="NLOPT_LN_NELDERMEAD")),data=biomassCC_wo_C)
*2. identifying the most parcimonious model*
dred_CC=dredge(mod_full_CC,rank="AICc",fixed=c("block","year"))
The most parcimonious model resulting from dredge (i.e. get.models(dred_CC,
1)[[1]] ) was:
dry_bio_weeds_m2 + 0.001 ~ *CC + N + scale(dry_bio_cover_m2) + tillage +
block + year *+
(1 | block:tillage) + (1 | block:tillage:N) + (1 | block:tillage:N:CC)
+ (1 | block:year) + (1 | block:year:tillage) + (1 | block:year:tillage:N)
+ (1 | block:year:tillage:N:CC) +
* CC:N + CC:scale(dry_bio_cover_m2) + CC:tillage +
scale(dry_bio_cover_m2):tillage*
If I refit this model in a different (more logical) order with the same
variables, as followed,
dry_bio_weeds_m2 + 0.001 ~* block + year + scale(dry_bio_cover_m2) +
tillage + N + CC + CC:N + CC:tillage + CC:scale(dry_bio_cover_m2) +
tillage:scale(dry_bio_cover_m2)* +
(1 | block:tillage) + (1 | block:tillage:N) + (1 | block:tillage:N:CC) + (1
| block:year) + (1 | block:year:tillage) + (1 | block:year:tillage:N) + (1
| block:year:tillage:N:CC)
I obtain similar estimates but drastically different SE, which of course
impacts post-hoc analysis.
The condition number of both models is also slightly different (12.1 for
the model with the variables in the order returned by dredge and 14.3 for
the model with the "logical" order), although no severe collinearity was
detected.
Model with variables fitted in the order given by dredge:
Generalized linear mixed model fit by maximum likelihood (Laplace
Approximation) ['glmerMod']
Family: gaussian ( sqrt )
Formula: dry_bio_weeds_m2 + 0.001 ~ CC + N + scale(dry_bio_cover_m2) +
tillage + block + year + (1 | block:tillage) + (1 |
block:tillage:N) +
(1 | block:tillage:N:CC) + (1 | block:year) + (1 |
block:year:tillage) + (1 | block:year:tillage:N) + (1 |
block:year:tillage:N:CC) +
CC:N + CC:scale(dry_bio_cover_m2) + CC:tillage +
scale(dry_bio_cover_m2):tillage
Data: biomassCC_wo_C
Control: glmerControl(optimizer = "nloptwrap", optCtrl =
list(algorithm = "NLOPT_LN_NELDERMEAD"))
AIC BIC logLik deviance df.resid
6294.5 6416.4 -3116.2 6232.5 347
Scaled residuals:
Min 1Q Median 3Q Max
-4.7837 -0.5006 -0.0011 0.5075 4.7593
Random effects:
Groups Name Variance Std.Dev.
block:year:tillage:N:CC (Intercept) 3.061e+04 1.750e+02
block:tillage:N:CC (Intercept) 2.162e-05 4.650e-03
block:year:tillage:N (Intercept) 2.645e-05 5.143e-03
block:tillage:N (Intercept) 2.125e+01 4.610e+00
block:year:tillage (Intercept) 4.656e-03 6.823e-02
block:year (Intercept) 3.848e+02 1.962e+01
block:tillage (Intercept) 4.204e-04 2.050e-02
Residual 4.505e+03 6.712e+01
Number of obs: 378, groups:
block:year:tillage:N:CC, 192; block:tillage:N:CC, 96;
block:year:tillage:N, 64; block:tillage:N, 32; block:year:tillage, 16;
block:year, 8; block:tillage, 8
Fixed effects:
Estimate Std. Error t value Pr(>|z|)
(Intercept) 13.829033 0.001353 10219.154 < 2e-16 ***
CCVv -0.064097 0.001353 -47.367 < 2e-16 ***
CCTs 0.059189 0.001353 43.739 < 2e-16 ***
N.L 5.506568 0.001310 4202.646 < 2e-16 ***
N.Q -0.387064 0.001310 -295.407 < 2e-16 ***
N.C -0.123713 0.001310 -94.417 < 2e-16 ***
scale(dry_bio_cover_m2) -3.096732 0.001309 -2366.243 < 2e-16 ***
tillageRT -0.034498 0.001310 -26.329 < 2e-16 ***
blockR2 -1.369223 0.464429 -2.948 0.0032 **
blockR3 -0.897212 0.001353 -663.006 < 2e-16 ***
blockR4 0.536253 0.464366 1.155 0.2482
year2014 0.484093 0.350696 1.380 0.1675
CCVv:N.L -3.795768 0.652249 -5.820 5.90e-09 ***
CCTs:N.L -3.760854 0.652139 -5.767 8.07e-09 ***
CCVv:N.Q -0.630917 0.652215 -0.967 0.3334
CCTs:N.Q -0.258428 0.652139 -0.396 0.6919
CCVv:N.C 0.857098 0.652113 1.314 0.1887
CCTs:N.C 0.273846 0.652088 0.420 0.6745
CCVv:scale(dry_bio_cover_m2) -2.834524 0.001310 -2163.350 < 2e-16 ***
CCTs:scale(dry_bio_cover_m2) -2.564852 0.001401 -1831.105 < 2e-16 ***
CCVv:tillageRT 4.446872 0.001401 3174.709 < 2e-16 ***
CCTs:tillageRT 2.758362 0.001401 1969.252 < 2e-16 ***
scale(dry_bio_cover_m2):tillageRT 2.022118 0.001310 1543.318 < 2e-16 ***
Model with variables fitted in "logical" order:
Generalized linear mixed model fit by maximum likelihood (Laplace
Approximation) ['glmerMod']
Family: gaussian ( sqrt )
Formula: dry_bio_weeds_m2 + 0.001 ~ block + year + scale(dry_bio_cover_m2) +
tillage + N + CC + CC:N + CC:tillage + CC:scale(dry_bio_cover_m2)
+ tillage:scale(dry_bio_cover_m2) + (1 | block:tillage) + (1 |
block:tillage:N) + (1 | block:tillage:N:CC) + (1 | block:year) +
(1 | block:year:tillage) + (1 | block:year:tillage:N) + (1 |
block:year:tillage:N:CC)
Data: biomassCC_wo_C
Control: glmerControl(optimizer = "nloptwrap", optCtrl =
list(algorithm = "NLOPT_LN_NELDERMEAD"))
AIC BIC logLik deviance df.resid
6294.5 6416.4 -3116.2 6232.5 347
Scaled residuals:
Min 1Q Median 3Q Max
-4.7838 -0.5009 -0.0011 0.5075 4.7594
Random effects:
Groups Name Variance Std.Dev.
block:year:tillage:N:CC (Intercept) 3.061e+04 1.749e+02
block:tillage:N:CC (Intercept) 2.141e-05 4.627e-03
block:year:tillage:N (Intercept) 2.605e-05 5.104e-03
block:tillage:N (Intercept) 2.106e+01 4.589e+00
block:year:tillage (Intercept) 4.487e-03 6.699e-02
block:year (Intercept) 3.822e+02 1.955e+01
block:tillage (Intercept) 4.228e-04 2.056e-02
Residual 4.504e+03 6.711e+01
Number of obs: 378, groups:
block:year:tillage:N:CC, 192; block:tillage:N:CC, 96;
block:year:tillage:N, 64; block:tillage:N, 32; block:year:tillage, 16;
block:year, 8; block:tillage, 8
Fixed effects:
Estimate Std. Error t value Pr(>|z|)
(Intercept) 13.82995 42.86352 0.323 0.74696
blockR2 -1.36672 40.78008 -0.034 0.97326
blockR3 -0.89843 40.78048 -0.022 0.98242
blockR4 0.53355 40.77879 0.013 0.98956
year2014 0.48290 28.79334 0.017 0.98662
scale(dry_bio_cover_m2) -3.09673 0.81313 -3.808 0.00014 ***
tillageRT -0.03537 43.77607 -0.001 0.99936
N.L 5.50829 43.77552 0.126 0.89987
N.Q -0.38899 43.77300 -0.009 0.99291
N.C -0.12513 43.77337 -0.003 0.99772
CCVv -0.06447 43.74925 -0.001 0.99882
CCTs 0.05444 43.75387 0.001 0.99901
N.L:CCVv -3.79754 61.86354 -0.061 0.95105
N.Q:CCVv -0.62820 61.86055 -0.010 0.99190
N.C:CCVv 0.86704 61.85967 0.014 0.98882
N.L:CCTs -3.76680 61.86056 -0.061 0.95145
N.Q:CCTs -0.25653 61.85857 -0.004 0.99669
N.C:CCTs 0.27781 61.85861 0.004 0.99642
tillageRT:CCVv 4.44671 61.86316 0.072 0.94270
tillageRT:CCTs 2.75968 61.86792 0.045 0.96442
scale(dry_bio_cover_m2):CCVv -2.83780 0.93223 -3.044 0.00233 **
scale(dry_bio_cover_m2):CCTs -2.56887 1.32152 -1.944 0.05191 .
scale(dry_bio_cover_m2):tillageRT 2.02139 0.85983 2.351 0.01873 *
I can share the data if need be. Thank you very much for your help.
Sincerely,
Guillaume ADEUX
[[alternative HTML version deleted]]
More information about the R-sig-mixed-models
mailing list