[R] Column names of model.matrix's output with contrast.arg
John Fox
j|ox @end|ng |rom mcm@@ter@c@
Mon Jun 17 23:34:07 CEST 2024
Dear Christophe and Ben,
Also see the car package for replacements for contr.treatment(),
contr.sum(), and contr.helmert() -- e.g., help("contr.Sum", package="car").
These functions have been in the car package for more than two decades,
and AFAIK, no one uses them (including myself). I didn't write a
replacement for contr.poly() because the current coefficient labeling
seemed reasonably transparent.
Best,
John
--
John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
web: https://www.john-fox.ca/
--
On 2024-06-17 4:29 p.m., Ben Bolker wrote:
> Caution: External email.
>
>
> It's sorta-kinda-obliquely-partially documented in the examples:
>
> zapsmall(cP <- contr.poly(3)) # Linear and Quadratic
>
> output:
>
> .L .Q
> [1,] -0.7071068 0.4082483
> [2,] 0.0000000 -0.8164966
> [3,] 0.7071068 0.4082483
>
> FWIW the faux package provides better-named alternatives.
>
>
> On 2024-06-17 4:25 p.m., Christophe Dutang wrote:
>> Thanks for your reply.
>>
>> It might good to document the naming convention in ?contrasts. It is
>> hard to understand .L for linear, .Q for quadratic, .C for cubic and
>> ^n for other degrees.
>>
>> For contr.sum, we could have used .Sum<level1>, .Sum<level2>…
>>
>> Maybe the examples ?model.matrix should use names in dd objects so
>> that we observe when names are dropped.
>>
>> Kind regards, Christophe
>>
>>
>>> Le 14 juin 2024 à 11:45, peter dalgaard <pdalgd using gmail.com> a écrit :
>>>
>>> You're at the mercy of the various contr.XXX functions. They may or
>>> may not set the colnames on the matrices that they generate.
>>>
>>> The rationales for (not) setting them is not perfectly transparent,
>>> but you obviously cannot use level names on contr.poly, so it uses
>>> .L, .Q, etc.
>>>
>>> In MASS, contr.sdif is careful about labeling the columns with the
>>> levels that are being diff'ed.
>>>
>>> For contr.treatment, there is a straightforward connection to 0/1
>>> dummy variables, so level names there are natural.
>>>
>>> One could use levels in contr.sum and contr.helmert, but it might
>>> confuse users that comparisons are with the average of all levels or
>>> preceding levels. (It can be quite confusing when coding is +1 for
>>> male and -1 for female, so that the gender difference is twice the
>>> coefficient.)
>>>
>>> -pd
>>>
>>>> On 14 Jun 2024, at 08:12 , Christophe Dutang <dutangc using gmail.com> wrote:
>>>>
>>>> Dear list,
>>>>
>>>> Changing the default contrasts used in glm() makes me aware how
>>>> model.matrix() set column names.
>>>>
>>>> With default contrasts, model.matrix() use the level values to name
>>>> the columns. However with other contrasts, model.matrix() use the
>>>> level indexes. In the documentation, I don’t see anything in the
>>>> documentation related to this ? It does not seem natural to have
>>>> such a behavior?
>>>>
>>>> Any comment is welcome.
>>>>
>>>> An example is below.
>>>>
>>>> Kind regards, Christophe
>>>>
>>>>
>>>> #example from ?glm
>>>> counts <- c(18,17,15,20,10,20,25,13,12)
>>>> outcome <- paste0("O", gl(3,1,9))
>>>> treatment <- paste0("T", gl(3,3))
>>>>
>>>> X3 <- model.matrix(counts ~ outcome + treatment)
>>>> X4 <- model.matrix(counts ~ outcome + treatment, contrasts =
>>>> list("outcome"="contr.sum"))
>>>> X5 <- model.matrix(counts ~ outcome + treatment, contrasts =
>>>> list("outcome"="contr.helmert"))
>>>>
>>>> #check with original factor
>>>> cbind.data.frame(X3, outcome)
>>>> cbind.data.frame(X4, outcome)
>>>> cbind.data.frame(X5, outcome)
>>>>
>>>> #same issue with glm
>>>> glm.D93 <- glm(counts ~ outcome + treatment, family = poisson())
>>>> glm.D94 <- glm(counts ~ outcome + treatment, family = poisson(),
>>>> contrasts = list("outcome"="contr.sum"))
>>>> glm.D95 <- glm(counts ~ outcome + treatment, family = poisson(),
>>>> contrasts = list("outcome"="contr.helmert"))
>>>>
>>>> coef(glm.D93)
>>>> coef(glm.D94)
>>>> coef(glm.D95)
>>>>
>>>> #check linear predictor
>>>> cbind(X3 %*% coef(glm.D93), predict(glm.D93))
>>>> cbind(X4 %*% coef(glm.D94), predict(glm.D94))
>>>>
>>>> -------------------------------------------------
>>>> Christophe DUTANG
>>>> LJK, Ensimag, Grenoble INP, UGA, France
>>>> ILB research fellow
>>>> Web: http://dutangc.free.fr
>>>>
>>>> ______________________________________________
>>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>> --
>>> Peter Dalgaard, Professor,
>>> Center for Statistics, Copenhagen Business School
>>> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
>>> Phone: (+45)38153501
>>> Office: A 4.23
>>> Email: pd.mes using cbs.dk Priv: PDalgd using gmail.com
>>>
>>
>> ______________________________________________
>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> --
> Dr. Benjamin Bolker
> Professor, Mathematics & Statistics and Biology, McMaster University
> Director, School of Computational Science and Engineering
> (Acting) Graduate chair, Mathematics & Statistics
> > E-mail is sent at my convenience; I don't expect replies outside of
> working hours.
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list