[R] Column names of model.matrix's output with contrast.arg

Mon Jun 17 23:34:07 CEST 2024

Dear Christophe and Ben,

Also see the car package for replacements for contr.treatment(), 
contr.sum(), and contr.helmert() -- e.g., help("contr.Sum", package="car").

These functions have been in the car package for more than two decades, 
and AFAIK, no one uses them (including myself). I didn't write a 
replacement for contr.poly() because the current coefficient labeling 
seemed reasonably transparent.

Best,
  John

-- 
John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
web: https://www.john-fox.ca/

--
On 2024-06-17 4:29 p.m., Ben Bolker wrote:
> Caution: External email.
> 
> 
>    It's sorta-kinda-obliquely-partially documented in the examples:
> 
> zapsmall(cP <- contr.poly(3)) # Linear and Quadratic
> 
> output:
> 
>              .L         .Q
> [1,] -0.7071068  0.4082483
> [2,]  0.0000000 -0.8164966
> [3,]  0.7071068  0.4082483
> 
> FWIW the faux package provides better-named alternatives.
> 
> 
> On 2024-06-17 4:25 p.m., Christophe Dutang wrote:
>> Thanks for your reply.
>>
>> It might good to document the naming convention in ?contrasts. It is 
>> hard to understand .L for linear, .Q for quadratic, .C for cubic and 
>> ^n for other degrees.
>>
>> For contr.sum, we could have used .Sum<level1>, .Sum<level2>…
>>
>> Maybe the examples ?model.matrix should use names in dd objects so 
>> that we observe when names are dropped.
>>
>> Kind regards, Christophe
>>
>>
>>> Le 14 juin 2024 à 11:45, peter dalgaard <pdalgd using gmail.com> a écrit :
>>>
>>> You're at the mercy of the various contr.XXX functions. They may or 
>>> may not set the colnames on the matrices that they generate.
>>>
>>> The rationales for (not) setting them is not perfectly transparent, 
>>> but you obviously cannot use level names on contr.poly, so it uses 
>>> .L, .Q, etc.
>>>
>>> In MASS, contr.sdif is careful about labeling the columns with the 
>>> levels that are being diff'ed.
>>>
>>> For contr.treatment, there is a straightforward connection to 0/1 
>>> dummy variables, so level names there are natural.
>>>
>>> One could use levels in contr.sum and contr.helmert, but it might 
>>> confuse users that comparisons are with the average of all levels or 
>>> preceding levels. (It can be quite confusing when coding is +1 for 
>>> male and -1 for female, so that the gender difference is twice the 
>>> coefficient.)
>>>
>>> -pd
>>>
>>>> On 14 Jun 2024, at 08:12 , Christophe Dutang <dutangc using gmail.com> wrote:
>>>>
>>>> Dear list,
>>>>
>>>> Changing the default contrasts used in glm() makes me aware how 
>>>> model.matrix() set column names.
>>>>
>>>> With default contrasts, model.matrix() use the level values to name 
>>>> the columns. However with other contrasts, model.matrix() use the 
>>>> level indexes. In the documentation, I don’t see anything in the 
>>>> documentation related to this ? It does not seem natural to have 
>>>> such a behavior?
>>>>
>>>> Any comment is welcome.
>>>>
>>>> An example is below.
>>>>
>>>> Kind regards, Christophe
>>>>
>>>>
>>>> #example from ?glm
>>>> counts <- c(18,17,15,20,10,20,25,13,12)
>>>> outcome <- paste0("O", gl(3,1,9))
>>>> treatment <- paste0("T", gl(3,3))
>>>>
>>>> X3 <- model.matrix(counts ~ outcome + treatment)
>>>> X4 <- model.matrix(counts ~ outcome + treatment, contrasts = 
>>>> list("outcome"="contr.sum"))
>>>> X5 <- model.matrix(counts ~ outcome + treatment, contrasts = 
>>>> list("outcome"="contr.helmert"))
>>>>
>>>> #check with original factor
>>>> cbind.data.frame(X3, outcome)
>>>> cbind.data.frame(X4, outcome)
>>>> cbind.data.frame(X5, outcome)
>>>>
>>>> #same issue with glm
>>>> glm.D93 <- glm(counts ~ outcome + treatment, family = poisson())
>>>> glm.D94 <- glm(counts ~ outcome + treatment, family = poisson(), 
>>>> contrasts = list("outcome"="contr.sum"))
>>>> glm.D95 <- glm(counts ~ outcome + treatment, family = poisson(), 
>>>> contrasts = list("outcome"="contr.helmert"))
>>>>
>>>> coef(glm.D93)
>>>> coef(glm.D94)
>>>> coef(glm.D95)
>>>>
>>>> #check linear predictor
>>>> cbind(X3 %*% coef(glm.D93), predict(glm.D93))
>>>> cbind(X4 %*% coef(glm.D94), predict(glm.D94))
>>>>
>>>> -------------------------------------------------
>>>> Christophe DUTANG
>>>> LJK, Ensimag, Grenoble INP, UGA, France
>>>> ILB research fellow
>>>> Web: http://dutangc.free.fr
>>>>
>>>> ______________________________________________
>>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide 
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>> -- 
>>> Peter Dalgaard, Professor,
>>> Center for Statistics, Copenhagen Business School
>>> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
>>> Phone: (+45)38153501
>>> Office: A 4.23
>>> Email: pd.mes using cbs.dk  Priv: PDalgd using gmail.com
>>>
>>
>> ______________________________________________
>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide 
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> 
> -- 
> Dr. Benjamin Bolker
> Professor, Mathematics & Statistics and Biology, McMaster University
> Director, School of Computational Science and Engineering
> (Acting) Graduate chair, Mathematics & Statistics
>  > E-mail is sent at my convenience; I don't expect replies outside of
> working hours.
> 
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.