[R] how to create model matrix of second order terms
Rui Barradas
ru|pb@rr@d@@ @end|ng |rom @@po@pt
Tue Mar 25 07:15:38 CET 2025
Às 15:28 de 24/03/2025, Stephen Bond via R-help escreveu:
> Folks,
>
> I appreciate your effort, but maybe I was not explicit enough, so let
> me try again.
>
> The current setup for formulas does not allow for I(x^2) terms as
> explained in the MASS book at the end of Section 6.2 the x:x
> interaction is treated as x.
>
> So I need to write my own code, which is clumsy unless you can refer me
> to a package that already exists or give me an idea how to improve my
> code. Also, writing out terms is not feasible when there are 1000
> variables, so the code needs to deal with taking a wide data frame or
> matrix with column names for convenience.
>
> Let me know your ideas :-)
>
> On Mon, 2025-03-24 at 02:43 -0700, Bert Gunter wrote:
>> Full disclosure: I did not attempt to decipher your code.
>>
>> But ~(A+B +C)^2 - (A + B + C)
>> gives all 2nd order interactions whether the terms are factors or
>> numeric.
>>
>> ~I(A^2) + I(B^2) gives quadratics in A and B, which must be numeric,
>> not factors, of course
>>
>> You can combine these as necessary to get a formula expression for
>> just 2nd order terms. Wrapping this in model.matrix() should then
>> give you the model matrix using "treatment" contrasts for the
>> contrasts involving factors (you can change the contrast types using
>> the 'contrasts.arg' argument of model.matrix())
>>
>> 1. Does this help?
>> 2. Do check this to make sure I'm correct
>>
>> Cheers,
>> Bert
>>
>> "An educated person is one who can entertain new ideas, entertain
>> others, and entertain herself."
>>
>>
>>
>> On Mon, Mar 24, 2025 at 12:22 AM Stephen Bond via R-help
>> <r-help using r-project.org> wrote:
>>> I am sending to this forum as stackoverflow has devolved into sth
>>> pretty bad.
>>> Below code shows how to get what I want in a clumsy way.
>>>
>>> cols <- letters[1:4]
>>> a1 <- outer(cols,cols,paste0)
>>> b1 <- a1[!lower.tri(a1)]
>>>
>>> X <- matrix(rnorm(80),ncol=4)
>>> colnames(X) <- cols
>>> X <- as.data.frame(X)
>>> XX <- matrix(0,nrow=nrow(X),ncol=length(b1))
>>> colnames(XX) <- b1
>>>
>>> for (k in 1:length(b1)){
>>> XX[,k] <- X[,substr(b1[k],1,1)]*X[,substr(b1[k],2,2)]
>>> }
>>>
>>>
>>>
>>> Is there a way to get that using a formula or some neat trick? The
>>> above will not work for factors, so I will need to create the
>>> factor
>>> crossings using formula a*b*c and then cross with the numerics,
>>> which
>>> is even more clumsy.
>>> Thanks everybody
>>>
>>> ______________________________________________
>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> https://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide https://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
Hello,
Are you looking for ?poly to generate all 1st and 2nd order combinations?
In the outputs below, dimnames[[2]] tell which term corresponds to which
column.
So 1.0.0.0 is 'a' only and 2.0.0.0 is 'a^2'.
cols <- letters[1:4]
a1 <- outer(cols,cols,paste0)
b1 <- a1[!lower.tri(a1)]
X <- matrix(rnorm(80),ncol=4)
colnames(X) <- cols
X <- as.data.frame(X)
XX <- matrix(0,nrow=nrow(X),ncol=length(b1))
colnames(XX) <- b1
for (k in 1:length(b1)){
XX[,k] <- X[,substr(b1[k],1,1)]*X[,substr(b1[k],2,2)]
}
# XX
# model.matrix( ~ (a + b + c + d)^2 , data=X)
# model.matrix( ~ (a + b + c + d)^2 - (a + b + c +d), data=X)
# Is this what you want?
# Ugly colnames
m <- model.matrix( ~ poly(a, b, c, d, degree = 2L) , data = X)
dimnames(m)
#> [[1]]
#> [1] "1" "2" "3" "4" "5" "6" "7" "8" "9" "10" "11" "12" "13"
"14" "15"
#> [16] "16" "17" "18" "19" "20"
#>
#> [[2]]
#> [1] "(Intercept)" "poly(a, b, c, d, degree =
2)1.0.0.0"
#> [3] "poly(a, b, c, d, degree = 2)2.0.0.0" "poly(a, b, c, d, degree =
2)0.1.0.0"
#> [5] "poly(a, b, c, d, degree = 2)1.1.0.0" "poly(a, b, c, d, degree =
2)0.2.0.0"
#> [7] "poly(a, b, c, d, degree = 2)0.0.1.0" "poly(a, b, c, d, degree =
2)1.0.1.0"
#> [9] "poly(a, b, c, d, degree = 2)0.1.1.0" "poly(a, b, c, d, degree =
2)0.0.2.0"
#> [11] "poly(a, b, c, d, degree = 2)0.0.0.1" "poly(a, b, c, d, degree =
2)1.0.0.1"
#> [13] "poly(a, b, c, d, degree = 2)0.1.0.1" "poly(a, b, c, d, degree =
2)0.0.1.1"
#> [15] "poly(a, b, c, d, degree = 2)0.0.0.2"
# These colnames are nicer
p <- with(X, poly(a, b, c, d, degree = 2L))
attributes(p)
#> $dim
#> [1] 20 14
#>
#> $dimnames
#> $dimnames[[1]]
#> NULL
#>
#> $dimnames[[2]]
#> 2 3 4 5 7 10 11
13
#> "1.0.0.0" "2.0.0.0" "0.1.0.0" "1.1.0.0" "0.2.0.0" "0.0.1.0" "1.0.1.0"
"0.1.1.0"
#> 19 28 29 31 37 55
#> "0.0.2.0" "0.0.0.1" "1.0.0.1" "0.1.0.1" "0.0.1.1" "0.0.0.2"
#>
#>
#> $degree
#> [1] 1 2 1 2 2 1 2 2 2 1 2 2 2 2
#>
#> $coefs
#> $coefs[[1]]
#> $coefs[[1]]$alpha
#> [1] 0.1134201 0.4901752
#>
#> $coefs[[1]]$norm2
#> [1] 1.00000 20.00000 34.11147 140.62467
#>
#>
#> $coefs[[2]]
#> $coefs[[2]]$alpha
#> [1] 0.1356218 1.0041956
#>
#> $coefs[[2]]$norm2
#> [1] 1.00000 20.00000 21.53021 39.73742
#>
#>
#> $coefs[[3]]
#> $coefs[[3]]$alpha
#> [1] 0.2533081 0.1689801
#>
#> $coefs[[3]]$norm2
#> [1] 1.00000 20.00000 19.08499 19.83333
#>
#>
#> $coefs[[4]]
#> $coefs[[4]]$alpha
#> [1] -0.03597259 -0.87409789
#>
#> $coefs[[4]]$norm2
#> [1] 1.00000 20.00000 24.68273 73.41893
#>
#>
#>
#> $class
#> [1] "poly" "matrix"
dimnames(p)
#> [[1]]
#> NULL
#>
#> [[2]]
#> 2 3 4 5 7 10 11
13
#> "1.0.0.0" "2.0.0.0" "0.1.0.0" "1.1.0.0" "0.2.0.0" "0.0.1.0" "1.0.1.0"
"0.1.1.0"
#> 19 28 29 31 37 55
#> "0.0.2.0" "0.0.0.1" "1.0.0.1" "0.1.0.1" "0.0.1.1" "0.0.0.2"
Hope this helps,
Rui Barradas
--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença de vírus.
www.avg.com
More information about the R-help
mailing list