[R] gam()

Thu Jun 5 17:12:17 CEST 2003

Dear Henric,

At 05:01 PM 6/4/2003 +0200, Henric Nilsson wrote:

>I've now spent a couple of days trying to learn R and, in particular, the 
>gam() function, and I now have a few questions and reflections regarding 
>the latter. Maybe these things are implemented in some way that I'm not 
>yet aware of or have perhaps been decided by the R community to not be 
>what's wanted. Of course, my lack of complete theoretical understanding of 
>what mgcv really does may also show...
>
>1. When fitting models where a factor interacts with a smooth term, say 
>y~a+s(x,by=a.1)+s(x,by=a.2), I noticed that the rug in the plot of each of 
>the smooth terms is identical. I expected the rug in the plot of e.g. 
>s(x,by=a.1) to only include those x for which a.1=1 to be able to judge if 
>observations of x where a.1=1 are sparse in any region. Also, it would be 
>really if nice the "by=..." was included in the output of the plot.gam() 
>and the "Approximate significance of smooth terms:" part of the summary.gam().
>
>2. John Fox has modified anova.glm() into anova.gam() 
>(http://www.socsci.mcmaster.ca/jfox/Books/Companion/nonparametric-regression.txt) 
>for comparison of two or more fitted models based on the difference 
>between residual deviances. Indiscriminate use of such a procedure 
>shouldn't perhaps be encouraged, but I think that many users expect it to 
>be part of the mgcv package since this model selection idea is covered in 
>several texts and also implemented in S-plus (and may be OK for truly 
>nested models). And even if it's been decided that this functionality is 
>not wanted in mgcv, perhaps another function comparing several models by 
>the GCV/UBRE score and other useful statistics can be implemented?

The problem with comparing two gams in R fit with mgcv is that, by default, 
the degree of smoothing for terms is selected independently for each model. 
Simon Wood previously posted a message to the R-help list discussing this 
issue and making some suggestions. The issue doesn't arise in the same way 
with models fit by the gam function in S-PLUS because the degree of 
smoothing there is instead selected by the user. I should update my 
appendix on nonparametric regression to discuss this question -- the 
current presentation isn't really adequate.

>3. Some authors [1, 2] suggests pointwise estimation of odds ratios and 
>corresponding confidence intervals based on the smooth terms in a GAM. 
>Maybe something for mgcv?
>[1] Figueiras, A. & Cadarso-Suárez C. (2001) "Application of Nonparametric 
>Models for calculating Odds Ratios and Their Confidence Intervals for 
>Continuous Exposures", American Journal of Epidemiology, 154(3), 264-275.
>[2] Saez, M., Cadarso-Suárez C. & Figueiras, A. (2003) "np.OR: an S-Plus 
>function for pointwise nonparametric estimation of odds-ratios of 
>continuous predictors", Computer Methods and Programs in Biomedicine, 71, 
>175-179.
>
>4. For each purely parametric covariate a t-test is produced; I'd like to 
>have something like S-plus' anova.gam() to get an overall test. (Perhaps 
>with the addition of a choice between Type I and Type III tests, but I 
>guess that may be controversial). Is it possible?

John

-----------------------------------------------------
John Fox
Department of Sociology
McMaster University
Hamilton, Ontario, Canada L8S 4M4
email: jfox at mcmaster.ca
phone: 905-525-9140x23604
web: www.socsci.mcmaster.ca/jfox