[R] (gam) formula: Why different results for terms being factor vs. numeric?
Bert Gunter
gunter.berton at gene.com
Tue Oct 29 21:31:00 CET 2013
Think about it. How can one define a smooth term with a factor???
Further discussion is probably offtopic. Post on
stats.stackexchange.com if it still isn't obvious.
Cheers,
Bert
On Tue, Oct 29, 2013 at 1:16 PM, Marius Hofert
<marius.hofert at math.ethz.ch> wrote:
> Dear expeRts,
>
> If I specify group = as.factor(rep(1:2, each=n)) in the below
> definition of dat, I get the expected behavior I am looking for. I
> wonder why I
> don't get it if group is *not* a factor... My guess was that,
> internally, factors are treated as natural numbers (and this indeed
> seems to be true if you convert the latter to factors [essentially
> meaning changing the levels]), but replacing factors by numeric values
> (as below) does not provide the same answer.
>
> Cheers,
> Marius
>
>
> require(mgcv)
>
> n <- 10
> yrs <- 2000+seq_len(n)
> set.seed(271)
> dat <- data.frame(year = rep(yrs, 2),
> group = rep(1:2, each=n), # *not* a factor
> (as.factor() provides the expected behavior)
> resp = c(seq_len(n)+runif(n), 5+seq_len(n)+runif(n)))
> fit3 <- gam(resp ~ year + group - 1, data=dat)
> plot(yrs, fit3$fitted.values[seq_len(n)], type="l", ylim=range(dat$resp),
> xlab="Year", ylab="Response") # fit group A; mean over all
> responses in this group
> lines (yrs, fit3$fitted.values[n+seq_len(n)], col="blue") # fit group
> B; mean over all responses in this group
> points(yrs, dat$resp[seq_len(n)]) # actual response group A
> points(yrs, dat$resp[n+seq_len(n)], col="blue") # actual response group B
> ## => hmmm... because it is not a factor (?), this does not give an
> expected answer,
> ## but gam() still correctly figures out that there are two groups
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
--
Bert Gunter
Genentech Nonclinical Biostatistics
(650) 467-7374
More information about the R-help
mailing list