[R] Comparing and Interpreting GAMMs

Simon Wood s.wood at bath.ac.uk
Fri Jul 2 16:32:34 CEST 2010


On Friday 28 May 2010 08:23, Andrea Meyer wrote:
> Dear R users
>
> I have a question related to the interpretation of results based on GAMMs
> using Simon Woods package gamm4.
>
> I have repeated measurements (hours24) of subjects (vpnr) and one factor
> with three levels (pred). The outcome (dv) is binary.
>
> In the first model I'd like to test for differences among factor levels
> (main effects only): gamm.11<-gamm4(dv ~ pred +s(hours24), random = ~
> (1|vpnr),data=sdata,family=binomial)
>
> In the second model I'd like to test whether the smooths vary across factor
> levels (interaction): gamm.12<-gamm4(dv ~ pred +s(hours24)
> +s(hours24,by=pred), random = ~ (1|vpnr),data=sdata,family=binomial)
>
> Part of the output for both models is shown below.
>
> My questions are:
>
> 1) Which is the best way to test whether smooths actually vary across
> factor levels? Is thought that one way would be to compare the two models,
> e.g. by calculating differences between the two deviances and degrees of
> freedom (as long as this has a chi-square distribution). If so this would
> serve as a kind of overall test for interaction which would be fine. Would
> this be suitable or is there a better way to accomplish this?
--- The main problem here is the usual one that the null sets some variance 
parameters (inverse smoothing parameters) to the edge of the parameter space 
(0), so the generalized likelihood ratio test only gives a rather rough guide 
to significance of the alternative. That said in the example you have given 
the alternative model is so much better that this theoretical worry is not 
practically improtant. The difference in AIC reported by comparing the `mer' 
parts of the fit should be enough to convince anyone that you need the 
separate smooths. (And the `gam' part summaries say the same thing).

> 2) It would be interesting to compare smooths of specific factor levels
> with each other. I thought that I would get the answer to this question by
> correctly interpreting the coefficients of the second model. I've assumed
> that in the output below (at the very end) the (approximate) significance
> of the first smooth term s(hours24) tests for the smooth of the averaged
> values (across all three factor levels) against a horizontal line, whereas
> the three additional smooths (s(hours24):pred1 etc.) test for the deviation
> from this average to the respective factor level. If so how would I have to
> set up a model in which I could directly compare specific factors with each
> other? I've also tried to omit the smooth term s(hours24):
> gamm.12<-gamm4(dv ~ pred +s(hours24,by=pred), random = ~
> (1|vpnr),data=sdata,family=binomial) However this would obviously test each
> of the smooths aginst a horizontal line which is not what I want.
I would do this by coding up dummy variable for `pred' myself, which gave the 
smooths the interpretation that I'd like. For example...

B <- model.matrix(~pred) ## create model matrix corresponding to `pred' factor
sdata$pred.2 <- B[,2]  ## dummy variable for level 2 of pred
sdata$pred.3 <- B[,3] ## dummy variable for level 3 of pred 
gamm.13<-gamm4(dv ~ pred +s(hours24) + s(hours24,by=pred.2) + 
s(hours24,by=pred.3), random = ~(1|vpnr),data=sdata,family=binomial) 

... so the first smooth is the reference level, corresponding to the first 
level of pred, while the second 2 smooths are differences, between the other 
2 levels of pred and the reference level. Hope that's answerring the 
question, and sorry for the long delay.

best,
Simon

> Any help is greatly appreciated, thanks!
>
> Andrea
>
>
> Output of 1st model gamm.11:
> ----------------------------------------
> logLik(gamm.11$mer);deviance(gamm.11$mer);attributes(summary(gamm.11$mer))$
>AICtab[1];gamm.11$mer at deviance["disc"] 'log Lik.' -49054.65 (df=5)
>      ML
> 98109.3
>      AIC
>  98119.3
>  disc
> 97600
>
> > summary(gamm.11$mer);summary(gamm.11$gam)
>
> Generalized linear mixed model fit by the Laplace approximation
>    AIC   BIC logLik deviance
>  98119 98167 -49055    98109
> Random effects:
>  Groups Name        Variance   Std.Dev.
>  vpnr   (Intercept)    0.12965  0.36007
>  Xr.1   s(hours24)  1291.42444 35.93639
> Number of obs: 97920, groups: vpnr, 114; Xr.1, 8
>
> Fixed effects:
>                Estimate Std. Error z value Pr(>|z|)
> X(Intercept)  0.3713345  0.0644469   5.762 8.32e-09 ***
> Xpred2       -0.0575848  0.0865231  -0.666    0.506
> Xpred3        0.0003748  0.0869543   0.004    0.997
>
>>
> Parametric coefficients:
>               Estimate Std. Error z value Pr(>|z|)
> (Intercept)  0.3713345  0.0149858  24.779  < 2e-16 ***
> pred2       -0.0575848  0.0197488  -2.916  0.00355 **
> pred3        0.0003748  0.0198803   0.019  0.98496
>
>
> Output of 2nd model gamm.12:
> -----------------------------------------
> logLik(gamm.12$mer);deviance(gamm.12$mer);attributes(summary(gamm.12$mer))$
>AICtab[1];gamm.12$mer at deviance["disc"] 'log Lik.' -48714.29 (df=8)
>       ML
> 97428.59
>       AIC
>  97444.59
>     disc
> 96840.12
>
> > summary(gamm.12$mer);summary(gamm.12$gam)
>
> Generalized linear mixed model fit by the Laplace approximation
>    AIC   BIC logLik deviance
>  97445 97521 -48714    97429
> Random effects:
>  Groups Name             Variance   Std.Dev.
>  vpnr   (Intercept)         0.12777  0.35744
>  Xr.4   s(hours24):pred3  262.64433 16.20631
>  Xr.3   s(hours24):pred2    0.00000  0.00000
>  Xr.2   s(hours24):pred1  422.37197 20.55169
>  Xr.1   s(hours24)       1377.63864 37.11655
> Number of obs: 97920, groups: vpnr, 114; Xr.4, 8; Xr.3, 8; Xr.2, 8; Xr.1, 8
>
> Fixed effects:
>              Estimate Std. Error z value Pr(>|z|)
> X(Intercept)  0.33991    0.06402   5.310 1.10e-07 ***
> Xpred2       -0.04912    0.08607  -0.571    0.568
> Xpred3        0.11460    0.08691   1.319    0.187
>
>>
> Parametric coefficients:
>             Estimate Std. Error z value Pr(>|z|)
> (Intercept)  0.33991    0.01504  22.598  < 2e-16 ***
> pred2       -0.04912    0.02036  -2.412   0.0159 *
> pred3        0.11460    0.02217   5.170 2.34e-07 ***
>
> Approximate significance of smooth terms:
>                        edf    Ref.df Chi.sq p-value
> s(hours24)       7.943e+00 7.943e+00 9030.4  <2e-16 ***
> s(hours24):pred1 7.598e+00 7.598e+00  139.1  <2e-16 ***
> s(hours24):pred2 5.000e-12 5.000e-12    0.0      NA
> s(hours24):pred3 7.367e+00 7.367e+00  350.7  <2e-16 ***
>
>
>
>
> 	[[alternative HTML version deleted]]

-- 
> Simon Wood, Mathematical Sciences, University of Bath, Bath, BA2 7AY UK
> +44 1225 386603  www.maths.bath.ac.uk/~sw283 



More information about the R-help mailing list