[R] how to create model matrix of second order terms
Rui Barradas
ru|pb@rr@d@@ @end|ng |rom @@po@pt
Tue Mar 25 15:01:47 CET 2025
Hello,
Sorry, much simpler:
poly(as.matrix(X), degree = 2L)
Hope this helps,
Rui Barradas
Às 13:58 de 25/03/2025, Rui Barradas escreveu:
> Hello,
>
> This seems to work and is independent of the number of columns. 'p' is
> the output in my previous post.
>
>
> f <- function(x, data = X) with(data, eval(parse(text = x)))
> p2 <- poly(sapply(names(X), f), degree = 2L)
> identical(p, p2)
> # [1] TRUE
>
>
> Hope this helps,
>
> Rui Barradas
>
>
>
> Às 13:42 de 25/03/2025, Stephen Bond escreveu:
>> @ Rui
>>
>> that is the idea, but how do I apply this to a matrix with 200 columns?
>> I cannot write out the expression.
>> The colnames seem very messy, but they would be messy even under my
>> scheme with as little as 100 vars.
>>
>> On Tue, 2025-03-25 at 06:15 +0000, Rui Barradas wrote:
>>> Às 15:28 de 24/03/2025, Stephen Bond via R-help escreveu:
>>>> Folks,
>>>>
>>>> I appreciate your effort, but maybe I was not explicit enough, so
>>>> let
>>>> me try again.
>>>>
>>>> The current setup for formulas does not allow for I(x^2) terms as
>>>> explained in the MASS book at the end of Section 6.2 the x:x
>>>> interaction is treated as x.
>>>>
>>>> So I need to write my own code, which is clumsy unless you can
>>>> refer me
>>>> to a package that already exists or give me an idea how to improve
>>>> my
>>>> code. Also, writing out terms is not feasible when there are 1000
>>>> variables, so the code needs to deal with taking a wide data frame
>>>> or
>>>> matrix with column names for convenience.
>>>>
>>>> Let me know your ideas :-)
>>>>
>>>> On Mon, 2025-03-24 at 02:43 -0700, Bert Gunter wrote:
>>>>> Full disclosure: I did not attempt to decipher your code.
>>>>>
>>>>> But ~(A+B +C)^2 - (A + B + C)
>>>>> gives all 2nd order interactions whether the terms are factors or
>>>>> numeric.
>>>>>
>>>>> ~I(A^2) + I(B^2) gives quadratics in A and B, which must be
>>>>> numeric,
>>>>> not factors, of course
>>>>>
>>>>> You can combine these as necessary to get a formula expression
>>>>> for
>>>>> just 2nd order terms. Wrapping this in model.matrix() should then
>>>>> give you the model matrix using "treatment" contrasts for the
>>>>> contrasts involving factors (you can change the contrast types
>>>>> using
>>>>> the 'contrasts.arg' argument of model.matrix())
>>>>>
>>>>> 1. Does this help?
>>>>> 2. Do check this to make sure I'm correct
>>>>>
>>>>> Cheers,
>>>>> Bert
>>>>>
>>>>> "An educated person is one who can entertain new ideas, entertain
>>>>> others, and entertain herself."
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Mar 24, 2025 at 12:22 AM Stephen Bond via R-help
>>>>> <r-help using r-project.org> wrote:
>>>>>> I am sending to this forum as stackoverflow has devolved into
>>>>>> sth
>>>>>> pretty bad.
>>>>>> Below code shows how to get what I want in a clumsy way.
>>>>>>
>>>>>> cols <- letters[1:4]
>>>>>> a1 <- outer(cols,cols,paste0)
>>>>>> b1 <- a1[!lower.tri(a1)]
>>>>>>
>>>>>> X <- matrix(rnorm(80),ncol=4)
>>>>>> colnames(X) <- cols
>>>>>> X <- as.data.frame(X)
>>>>>> XX <- matrix(0,nrow=nrow(X),ncol=length(b1))
>>>>>> colnames(XX) <- b1
>>>>>>
>>>>>> for (k in 1:length(b1)){
>>>>>> XX[,k] <- X[,substr(b1[k],1,1)]*X[,substr(b1[k],2,2)]
>>>>>> }
>>>>>>
>>>>>>
>>>>>>
>>>>>> Is there a way to get that using a formula or some neat trick?
>>>>>> The
>>>>>> above will not work for factors, so I will need to create the
>>>>>> factor
>>>>>> crossings using formula a*b*c and then cross with the numerics,
>>>>>> which
>>>>>> is even more clumsy.
>>>>>> Thanks everybody
>>>>>>
>>>>>> ______________________________________________
>>>>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more,
>>>>>> see
>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>> PLEASE do read the posting guide
>>>>>> https://www.R-project.org/posting-guide.html
>>>>>> and provide commented, minimal, self-contained, reproducible
>>>>>> code.
>>>>
>>>> ______________________________________________
>>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> https://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>> Hello,
>>>
>>> Are you looking for ?poly to generate all 1st and 2nd order
>>> combinations?
>>> In the outputs below, dimnames[[2]] tell which term corresponds to
>>> which
>>> column.
>>> So 1.0.0.0 is 'a' only and 2.0.0.0 is 'a^2'.
>>>
>>>
>>>
>>> cols <- letters[1:4]
>>> a1 <- outer(cols,cols,paste0)
>>> b1 <- a1[!lower.tri(a1)]
>>>
>>> X <- matrix(rnorm(80),ncol=4)
>>> colnames(X) <- cols
>>> X <- as.data.frame(X)
>>> XX <- matrix(0,nrow=nrow(X),ncol=length(b1))
>>> colnames(XX) <- b1
>>>
>>> for (k in 1:length(b1)){
>>> XX[,k] <- X[,substr(b1[k],1,1)]*X[,substr(b1[k],2,2)]
>>> }
>>> # XX
>>> # model.matrix( ~ (a + b + c + d)^2 , data=X)
>>> # model.matrix( ~ (a + b + c + d)^2 - (a + b + c +d), data=X)
>>>
>>> # Is this what you want?
>>> # Ugly colnames
>>> m <- model.matrix( ~ poly(a, b, c, d, degree = 2L) , data = X)
>>> dimnames(m)
>>> #> [[1]]
>>> #> [1] "1" "2" "3" "4" "5" "6" "7" "8" "9" "10" "11" "12"
>>> "13"
>>> "14" "15"
>>> #> [16] "16" "17" "18" "19" "20"
>>> #>
>>> #> [[2]]
>>> #> [1] "(Intercept)" "poly(a, b, c, d,
>>> degree =
>>> 2)1.0.0.0"
>>> #> [3] "poly(a, b, c, d, degree = 2)2.0.0.0" "poly(a, b, c, d,
>>> degree =
>>> 2)0.1.0.0"
>>> #> [5] "poly(a, b, c, d, degree = 2)1.1.0.0" "poly(a, b, c, d,
>>> degree =
>>> 2)0.2.0.0"
>>> #> [7] "poly(a, b, c, d, degree = 2)0.0.1.0" "poly(a, b, c, d,
>>> degree =
>>> 2)1.0.1.0"
>>> #> [9] "poly(a, b, c, d, degree = 2)0.1.1.0" "poly(a, b, c, d,
>>> degree =
>>> 2)0.0.2.0"
>>> #> [11] "poly(a, b, c, d, degree = 2)0.0.0.1" "poly(a, b, c, d,
>>> degree =
>>> 2)1.0.0.1"
>>> #> [13] "poly(a, b, c, d, degree = 2)0.1.0.1" "poly(a, b, c, d,
>>> degree =
>>> 2)0.0.1.1"
>>> #> [15] "poly(a, b, c, d, degree = 2)0.0.0.2"
>>>
>>> # These colnames are nicer
>>> p <- with(X, poly(a, b, c, d, degree = 2L))
>>> attributes(p)
>>> #> $dim
>>> #> [1] 20 14
>>> #>
>>> #> $dimnames
>>> #> $dimnames[[1]]
>>> #> NULL
>>> #>
>>> #> $dimnames[[2]]
>>> #> 2 3 4 5 7 10
>>> 11
>>> 13
>>> #> "1.0.0.0" "2.0.0.0" "0.1.0.0" "1.1.0.0" "0.2.0.0" "0.0.1.0"
>>> "1.0.1.0"
>>> "0.1.1.0"
>>> #> 19 28 29 31 37 55
>>> #> "0.0.2.0" "0.0.0.1" "1.0.0.1" "0.1.0.1" "0.0.1.1" "0.0.0.2"
>>> #>
>>> #>
>>> #> $degree
>>> #> [1] 1 2 1 2 2 1 2 2 2 1 2 2 2 2
>>> #>
>>> #> $coefs
>>> #> $coefs[[1]]
>>> #> $coefs[[1]]$alpha
>>> #> [1] 0.1134201 0.4901752
>>> #>
>>> #> $coefs[[1]]$norm2
>>> #> [1] 1.00000 20.00000 34.11147 140.62467
>>> #>
>>> #>
>>> #> $coefs[[2]]
>>> #> $coefs[[2]]$alpha
>>> #> [1] 0.1356218 1.0041956
>>> #>
>>> #> $coefs[[2]]$norm2
>>> #> [1] 1.00000 20.00000 21.53021 39.73742
>>> #>
>>> #>
>>> #> $coefs[[3]]
>>> #> $coefs[[3]]$alpha
>>> #> [1] 0.2533081 0.1689801
>>> #>
>>> #> $coefs[[3]]$norm2
>>> #> [1] 1.00000 20.00000 19.08499 19.83333
>>> #>
>>> #>
>>> #> $coefs[[4]]
>>> #> $coefs[[4]]$alpha
>>> #> [1] -0.03597259 -0.87409789
>>> #>
>>> #> $coefs[[4]]$norm2
>>> #> [1] 1.00000 20.00000 24.68273 73.41893
>>> #>
>>> #>
>>> #>
>>> #> $class
>>> #> [1] "poly" "matrix"
>>>
>>> dimnames(p)
>>> #> [[1]]
>>> #> NULL
>>> #>
>>> #> [[2]]
>>> #> 2 3 4 5 7 10
>>> 11
>>> 13
>>> #> "1.0.0.0" "2.0.0.0" "0.1.0.0" "1.1.0.0" "0.2.0.0" "0.0.1.0"
>>> "1.0.1.0"
>>> "0.1.1.0"
>>> #> 19 28 29 31 37 55
>>> #> "0.0.2.0" "0.0.0.1" "1.0.0.1" "0.1.0.1" "0.0.1.1" "0.0.0.2"
>>>
>>>
>>> Hope this helps,
>>>
>>> Rui Barradas
>>>
>>>
>>
>
>
--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença de vírus.
www.avg.com
More information about the R-help
mailing list