[R] how to create model matrix of second order terms
Rui Barradas
ru|pb@rr@d@@ @end|ng |rom @@po@pt
Tue Mar 25 14:58:19 CET 2025
Hello,
This seems to work and is independent of the number of columns. 'p' is
the output in my previous post.
f <- function(x, data = X) with(data, eval(parse(text = x)))
p2 <- poly(sapply(names(X), f), degree = 2L)
identical(p, p2)
# [1] TRUE
Hope this helps,
Rui Barradas
Às 13:42 de 25/03/2025, Stephen Bond escreveu:
> @ Rui
>
> that is the idea, but how do I apply this to a matrix with 200 columns?
> I cannot write out the expression.
> The colnames seem very messy, but they would be messy even under my
> scheme with as little as 100 vars.
>
> On Tue, 2025-03-25 at 06:15 +0000, Rui Barradas wrote:
>> Às 15:28 de 24/03/2025, Stephen Bond via R-help escreveu:
>>> Folks,
>>>
>>> I appreciate your effort, but maybe I was not explicit enough, so
>>> let
>>> me try again.
>>>
>>> The current setup for formulas does not allow for I(x^2) terms as
>>> explained in the MASS book at the end of Section 6.2 the x:x
>>> interaction is treated as x.
>>>
>>> So I need to write my own code, which is clumsy unless you can
>>> refer me
>>> to a package that already exists or give me an idea how to improve
>>> my
>>> code. Also, writing out terms is not feasible when there are 1000
>>> variables, so the code needs to deal with taking a wide data frame
>>> or
>>> matrix with column names for convenience.
>>>
>>> Let me know your ideas :-)
>>>
>>> On Mon, 2025-03-24 at 02:43 -0700, Bert Gunter wrote:
>>>> Full disclosure: I did not attempt to decipher your code.
>>>>
>>>> But ~(A+B +C)^2 - (A + B + C)
>>>> gives all 2nd order interactions whether the terms are factors or
>>>> numeric.
>>>>
>>>> ~I(A^2) + I(B^2) gives quadratics in A and B, which must be
>>>> numeric,
>>>> not factors, of course
>>>>
>>>> You can combine these as necessary to get a formula expression
>>>> for
>>>> just 2nd order terms. Wrapping this in model.matrix() should then
>>>> give you the model matrix using "treatment" contrasts for the
>>>> contrasts involving factors (you can change the contrast types
>>>> using
>>>> the 'contrasts.arg' argument of model.matrix())
>>>>
>>>> 1. Does this help?
>>>> 2. Do check this to make sure I'm correct
>>>>
>>>> Cheers,
>>>> Bert
>>>>
>>>> "An educated person is one who can entertain new ideas, entertain
>>>> others, and entertain herself."
>>>>
>>>>
>>>>
>>>> On Mon, Mar 24, 2025 at 12:22 AM Stephen Bond via R-help
>>>> <r-help using r-project.org> wrote:
>>>>> I am sending to this forum as stackoverflow has devolved into
>>>>> sth
>>>>> pretty bad.
>>>>> Below code shows how to get what I want in a clumsy way.
>>>>>
>>>>> cols <- letters[1:4]
>>>>> a1 <- outer(cols,cols,paste0)
>>>>> b1 <- a1[!lower.tri(a1)]
>>>>>
>>>>> X <- matrix(rnorm(80),ncol=4)
>>>>> colnames(X) <- cols
>>>>> X <- as.data.frame(X)
>>>>> XX <- matrix(0,nrow=nrow(X),ncol=length(b1))
>>>>> colnames(XX) <- b1
>>>>>
>>>>> for (k in 1:length(b1)){
>>>>> XX[,k] <- X[,substr(b1[k],1,1)]*X[,substr(b1[k],2,2)]
>>>>> }
>>>>>
>>>>>
>>>>>
>>>>> Is there a way to get that using a formula or some neat trick?
>>>>> The
>>>>> above will not work for factors, so I will need to create the
>>>>> factor
>>>>> crossings using formula a*b*c and then cross with the numerics,
>>>>> which
>>>>> is even more clumsy.
>>>>> Thanks everybody
>>>>>
>>>>> ______________________________________________
>>>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more,
>>>>> see
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide
>>>>> https://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained, reproducible
>>>>> code.
>>>
>>> ______________________________________________
>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> https://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>> Hello,
>>
>> Are you looking for ?poly to generate all 1st and 2nd order
>> combinations?
>> In the outputs below, dimnames[[2]] tell which term corresponds to
>> which
>> column.
>> So 1.0.0.0 is 'a' only and 2.0.0.0 is 'a^2'.
>>
>>
>>
>> cols <- letters[1:4]
>> a1 <- outer(cols,cols,paste0)
>> b1 <- a1[!lower.tri(a1)]
>>
>> X <- matrix(rnorm(80),ncol=4)
>> colnames(X) <- cols
>> X <- as.data.frame(X)
>> XX <- matrix(0,nrow=nrow(X),ncol=length(b1))
>> colnames(XX) <- b1
>>
>> for (k in 1:length(b1)){
>> XX[,k] <- X[,substr(b1[k],1,1)]*X[,substr(b1[k],2,2)]
>> }
>> # XX
>> # model.matrix( ~ (a + b + c + d)^2 , data=X)
>> # model.matrix( ~ (a + b + c + d)^2 - (a + b + c +d), data=X)
>>
>> # Is this what you want?
>> # Ugly colnames
>> m <- model.matrix( ~ poly(a, b, c, d, degree = 2L) , data = X)
>> dimnames(m)
>> #> [[1]]
>> #> [1] "1" "2" "3" "4" "5" "6" "7" "8" "9" "10" "11" "12"
>> "13"
>> "14" "15"
>> #> [16] "16" "17" "18" "19" "20"
>> #>
>> #> [[2]]
>> #> [1] "(Intercept)" "poly(a, b, c, d,
>> degree =
>> 2)1.0.0.0"
>> #> [3] "poly(a, b, c, d, degree = 2)2.0.0.0" "poly(a, b, c, d,
>> degree =
>> 2)0.1.0.0"
>> #> [5] "poly(a, b, c, d, degree = 2)1.1.0.0" "poly(a, b, c, d,
>> degree =
>> 2)0.2.0.0"
>> #> [7] "poly(a, b, c, d, degree = 2)0.0.1.0" "poly(a, b, c, d,
>> degree =
>> 2)1.0.1.0"
>> #> [9] "poly(a, b, c, d, degree = 2)0.1.1.0" "poly(a, b, c, d,
>> degree =
>> 2)0.0.2.0"
>> #> [11] "poly(a, b, c, d, degree = 2)0.0.0.1" "poly(a, b, c, d,
>> degree =
>> 2)1.0.0.1"
>> #> [13] "poly(a, b, c, d, degree = 2)0.1.0.1" "poly(a, b, c, d,
>> degree =
>> 2)0.0.1.1"
>> #> [15] "poly(a, b, c, d, degree = 2)0.0.0.2"
>>
>> # These colnames are nicer
>> p <- with(X, poly(a, b, c, d, degree = 2L))
>> attributes(p)
>> #> $dim
>> #> [1] 20 14
>> #>
>> #> $dimnames
>> #> $dimnames[[1]]
>> #> NULL
>> #>
>> #> $dimnames[[2]]
>> #> 2 3 4 5 7 10
>> 11
>> 13
>> #> "1.0.0.0" "2.0.0.0" "0.1.0.0" "1.1.0.0" "0.2.0.0" "0.0.1.0"
>> "1.0.1.0"
>> "0.1.1.0"
>> #> 19 28 29 31 37 55
>> #> "0.0.2.0" "0.0.0.1" "1.0.0.1" "0.1.0.1" "0.0.1.1" "0.0.0.2"
>> #>
>> #>
>> #> $degree
>> #> [1] 1 2 1 2 2 1 2 2 2 1 2 2 2 2
>> #>
>> #> $coefs
>> #> $coefs[[1]]
>> #> $coefs[[1]]$alpha
>> #> [1] 0.1134201 0.4901752
>> #>
>> #> $coefs[[1]]$norm2
>> #> [1] 1.00000 20.00000 34.11147 140.62467
>> #>
>> #>
>> #> $coefs[[2]]
>> #> $coefs[[2]]$alpha
>> #> [1] 0.1356218 1.0041956
>> #>
>> #> $coefs[[2]]$norm2
>> #> [1] 1.00000 20.00000 21.53021 39.73742
>> #>
>> #>
>> #> $coefs[[3]]
>> #> $coefs[[3]]$alpha
>> #> [1] 0.2533081 0.1689801
>> #>
>> #> $coefs[[3]]$norm2
>> #> [1] 1.00000 20.00000 19.08499 19.83333
>> #>
>> #>
>> #> $coefs[[4]]
>> #> $coefs[[4]]$alpha
>> #> [1] -0.03597259 -0.87409789
>> #>
>> #> $coefs[[4]]$norm2
>> #> [1] 1.00000 20.00000 24.68273 73.41893
>> #>
>> #>
>> #>
>> #> $class
>> #> [1] "poly" "matrix"
>>
>> dimnames(p)
>> #> [[1]]
>> #> NULL
>> #>
>> #> [[2]]
>> #> 2 3 4 5 7 10
>> 11
>> 13
>> #> "1.0.0.0" "2.0.0.0" "0.1.0.0" "1.1.0.0" "0.2.0.0" "0.0.1.0"
>> "1.0.1.0"
>> "0.1.1.0"
>> #> 19 28 29 31 37 55
>> #> "0.0.2.0" "0.0.0.1" "1.0.0.1" "0.1.0.1" "0.0.1.1" "0.0.0.2"
>>
>>
>> Hope this helps,
>>
>> Rui Barradas
>>
>>
>
--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença de vírus.
www.avg.com
More information about the R-help
mailing list