[R] mgcv gam bs=re questions
William Shadish
wshadish at ucmerced.edu
Sat May 3 21:51:33 CEST 2014
Dear R-helpers,
I am working on a project assessing the prevalence and variance (random
effects) of linear and nonlinear trends in a data set that has short
time series (each time series identified as PID 1 through 5). I am using
mgcv (gam) with the bs=”re” option (more on why not gamm or gamm4 at the
end). I want to know whether trend is linear or nonlinear in each time
series, and whether trend varies significantly over time series (e.g.,
PID 1 through 5 below)--e.g., whether they have significantly different
edfs. The analysis predicts a count outcome (DVY) from time (SessIDX)
and treatment (TX), using a quasipoisson approach to correct standard
errors for overdispersion. The dataset in the examples below (SID5DVID1)
has 5 time series (PID 1 through 5) with 93, 97, 70, 86, and 103 time
points each, respectively. E.g.,
JID SID PID DVID SessIDX DVY TX
1 1 5 1 1 1 23 0
2 1 5 1 1 2 20 0
3 1 5 1 1 3 30 0
4 1 5 1 1 4 40 0
…
434 1 5 5 1 100 5 1
435 1 5 5 1 101 3 1
436 1 5 5 1 102 12 1
437 1 5 5 1 103 7 1
>
My question concerns how to interpret the random effects that result
from variations in the model. A model with smoothed trend and no random
effect yields
M1 <- gam(DVY ~ s(SessIDX) + factor(TX),
+ data = SID5DVID1,
+ family = quasipoisson(link="log"), method="REML")
Approximate significance of smooth terms:
edf Ref.df F p-value
s(SessIDX) 1.876 2.403 11.52 4.34e-06 ***
So trend over all five time series is linear and significant. If I allow
trend to interact with PID (be different for each time series), without
a random effect, I get
M1a <- gam(DVY ~ s(SessIDX, by = factor(PID)) + factor(TX),
data = SID5DVID1,
family = quasipoisson(link="log"), method="REML")
summary(M1a,all.p=TRUE)
Approximate significance of smooth terms:
edf Ref.df F p-value
s(SessIDX):factor(PID)1 1.001 1.001 31.827 2.98e-08 ***
s(SessIDX):factor(PID)2 1.000 1.000 2.658 0.103724
s(SessIDX):factor(PID)3 5.100 5.886 8.674 9.42e-09 ***
s(SessIDX):factor(PID)4 1.337 1.602 9.822 0.000366 ***
s(SessIDX):factor(PID)5 3.998 4.924 7.967 4.40e-07 ***
It appears trend may be quite different for each time series. So now I
want to test whether trend varies significantly over time series using
bs=”re”. One model is
M2 <- gam(DVY ~ s(SessIDX, bs = "re") + factor(TX),
data = SID5DVID1,
family = quasipoisson(link="log"), method="REML")
summary(M2,all.p=TRUE)
gam.vcomp(M2)
Approximate significance of smooth terms:
edf Ref.df F p-value
s(SessIDX) 0.9609 1 24.48 6.53e-07 ***
…
Standard deviations and 0.95 confidence intervals:
std.dev lower upper
s(SessIDX) 0.01470809 0.00347556 0.06224258
scale 3.20933040 3.00277692 3.43009218
Rank: 2/2
>
So the random effect std.dev is 0.01470809, and it is significant with
p-value = 6.53e-07. I had originally thought that this meant that trend
varied significantly over the five time series. However, after having
read a recent article (Gary J. McKeown and Ian Sneddon, DOI:
10.1037/a0034282), I am not sure I am correctly interpreting this. In
their article, this approach (a) changes the shape of the nonlinear
trend only slightly compared to not using random effects [i.e., just
using s(SessIDX)] but still produces only one trend line for all time
series, and (b) increases the width of the confidence bands around the
trend line (which of course makes perfect sense). In other words, it
does not obviously allow each time series (PID) to have a quite
different edf or trend line. So, parallel with M1a above, I can run:
M5a <- gam(DVY ~ s(SessIDX, by = factor(PID), bs="re") + factor(TX),
data = SID5DVID1,
family = quasipoisson(link="log"), method="REML")
summary(M5a,all.p=TRUE)
gam.vcomp(M5a)
Approximate significance of smooth terms:
edf Ref.df F p-value
s(SessIDX):factor(PID)1 0.9239 1 38.76 0.000368 ***
s(SessIDX):factor(PID)2 0.9892 1 107.00 < 2e-16 ***
s(SessIDX):factor(PID)3 0.9867 1 94.85 < 2e-16 ***
s(SessIDX):factor(PID)4 0.9830 1 108.33 1.64e-13 ***
s(SessIDX):factor(PID)5 0.9658 1 73.41 1.32e-07 ***
Standard deviations and 0.95 confidence intervals:
std.dev lower upper
s(SessIDX):factor(PID)1 0.008853821 0.001960054 0.03999387
s(SessIDX):factor(PID)2 0.065276527 0.016064261 0.26524874
s(SessIDX):factor(PID)3 0.048967099 0.012003711 0.19975296
s(SessIDX):factor(PID)4 0.027830597 0.006777380 0.11428341
s(SessIDX):factor(PID)5 0.012218325 0.002893741 0.05158977
scale 2.559988984 2.394424050 2.73700208
This seems to produce five random effects, all apparently significant.
But it is not clear to me how to interpret these random effects. For
example, consider s(SessIDX):factor(PID)1 0.008853821 above. Clearly
this is not what I want—it is not a measure of whether the five cases
differ significantly from each other in trend. I am guessing this random
effect concerns whether time points within PID1 differ significantly
from each other?
One might, of course, use gamm or gamm4 to address the random effects
question. I had hoped, however, to run all analyses within the same
program (gam) to avoid confounding differences in results with
differences in the estimation routines that different programs use.
Running it all in gam, however, encounters another problem, that it does
not appear that gam can use bs=”re” while fixing the trend to be linear.
For example:
> M4 <- gam(DVY ~ s(SessIDX, bs = "re", k = 2, fx=TRUE) +
+ factor(TX),
+ data = SID5DVID1,
+ family = quasipoisson(link="log"), method="REML")
> summary(M4,all.p=TRUE)
Error in b$smooth[[m]]$S[[1]] : subscript out of bounds
> gam.vcomp(M4)
Error in if (x$smooth[[i]]$sp[j] < 0) { : argument is of length zero
>
I have read extensively to try to understand all this (e.g., Wood 2013
“A simple test for random effects in regression models”) but have not
succeeded. Any help or advice would be appreciated very very much.
Will Shadish
--
William R. Shadish
Distinguished Professor
Founding Faculty
Mailing Address:
William R. Shadish
University of California
School of Social Sciences, Humanities and Arts
5200 North Lake Rd
Merced CA 95343
Physical/Delivery Address:
University of California Merced
ATTN: William Shadish
School of Social Sciences, Humanities and Arts
Facilities Services Building A
5200 North Lake Rd.
Merced, CA 95343
209-228-4372 voice
209-228-4007 fax (communal fax: be sure to include cover sheet)
wshadish at ucmerced.edu
http://faculty.ucmerced.edu/wshadish/index.htm
http://psychology.ucmerced.edu
More information about the R-help
mailing list