[R] how to create model matrix of second order terms
Stephen Bond
@tephen@cbond @end|ng |rom y@hoo@com
Tue Mar 25 14:42:30 CET 2025
@ Rui
that is the idea, but how do I apply this to a matrix with 200 columns?
I cannot write out the expression.
The colnames seem very messy, but they would be messy even under my
scheme with as little as 100 vars.
On Tue, 2025-03-25 at 06:15 +0000, Rui Barradas wrote:
> Às 15:28 de 24/03/2025, Stephen Bond via R-help escreveu:
> > Folks,
> >
> > I appreciate your effort, but maybe I was not explicit enough, so
> > let
> > me try again.
> >
> > The current setup for formulas does not allow for I(x^2) terms as
> > explained in the MASS book at the end of Section 6.2 the x:x
> > interaction is treated as x.
> >
> > So I need to write my own code, which is clumsy unless you can
> > refer me
> > to a package that already exists or give me an idea how to improve
> > my
> > code. Also, writing out terms is not feasible when there are 1000
> > variables, so the code needs to deal with taking a wide data frame
> > or
> > matrix with column names for convenience.
> >
> > Let me know your ideas :-)
> >
> > On Mon, 2025-03-24 at 02:43 -0700, Bert Gunter wrote:
> > > Full disclosure: I did not attempt to decipher your code.
> > >
> > > But ~(A+B +C)^2 - (A + B + C)
> > > gives all 2nd order interactions whether the terms are factors or
> > > numeric.
> > >
> > > ~I(A^2) + I(B^2) gives quadratics in A and B, which must be
> > > numeric,
> > > not factors, of course
> > >
> > > You can combine these as necessary to get a formula expression
> > > for
> > > just 2nd order terms. Wrapping this in model.matrix() should then
> > > give you the model matrix using "treatment" contrasts for the
> > > contrasts involving factors (you can change the contrast types
> > > using
> > > the 'contrasts.arg' argument of model.matrix())
> > >
> > > 1. Does this help?
> > > 2. Do check this to make sure I'm correct
> > >
> > > Cheers,
> > > Bert
> > >
> > > "An educated person is one who can entertain new ideas, entertain
> > > others, and entertain herself."
> > >
> > >
> > >
> > > On Mon, Mar 24, 2025 at 12:22 AM Stephen Bond via R-help
> > > <r-help using r-project.org> wrote:
> > > > I am sending to this forum as stackoverflow has devolved into
> > > > sth
> > > > pretty bad.
> > > > Below code shows how to get what I want in a clumsy way.
> > > >
> > > > cols <- letters[1:4]
> > > > a1 <- outer(cols,cols,paste0)
> > > > b1 <- a1[!lower.tri(a1)]
> > > >
> > > > X <- matrix(rnorm(80),ncol=4)
> > > > colnames(X) <- cols
> > > > X <- as.data.frame(X)
> > > > XX <- matrix(0,nrow=nrow(X),ncol=length(b1))
> > > > colnames(XX) <- b1
> > > >
> > > > for (k in 1:length(b1)){
> > > > XX[,k] <- X[,substr(b1[k],1,1)]*X[,substr(b1[k],2,2)]
> > > > }
> > > >
> > > >
> > > >
> > > > Is there a way to get that using a formula or some neat trick?
> > > > The
> > > > above will not work for factors, so I will need to create the
> > > > factor
> > > > crossings using formula a*b*c and then cross with the numerics,
> > > > which
> > > > is even more clumsy.
> > > > Thanks everybody
> > > >
> > > > ______________________________________________
> > > > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more,
> > > > see
> > > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > > PLEASE do read the posting guide
> > > > https://www.R-project.org/posting-guide.html
> > > > and provide commented, minimal, self-contained, reproducible
> > > > code.
> >
> > ______________________________________________
> > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > https://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> Hello,
>
> Are you looking for ?poly to generate all 1st and 2nd order
> combinations?
> In the outputs below, dimnames[[2]] tell which term corresponds to
> which
> column.
> So 1.0.0.0 is 'a' only and 2.0.0.0 is 'a^2'.
>
>
>
> cols <- letters[1:4]
> a1 <- outer(cols,cols,paste0)
> b1 <- a1[!lower.tri(a1)]
>
> X <- matrix(rnorm(80),ncol=4)
> colnames(X) <- cols
> X <- as.data.frame(X)
> XX <- matrix(0,nrow=nrow(X),ncol=length(b1))
> colnames(XX) <- b1
>
> for (k in 1:length(b1)){
> XX[,k] <- X[,substr(b1[k],1,1)]*X[,substr(b1[k],2,2)]
> }
> # XX
> # model.matrix( ~ (a + b + c + d)^2 , data=X)
> # model.matrix( ~ (a + b + c + d)^2 - (a + b + c +d), data=X)
>
> # Is this what you want?
> # Ugly colnames
> m <- model.matrix( ~ poly(a, b, c, d, degree = 2L) , data = X)
> dimnames(m)
> #> [[1]]
> #> [1] "1" "2" "3" "4" "5" "6" "7" "8" "9" "10" "11" "12"
> "13"
> "14" "15"
> #> [16] "16" "17" "18" "19" "20"
> #>
> #> [[2]]
> #> [1] "(Intercept)" "poly(a, b, c, d,
> degree =
> 2)1.0.0.0"
> #> [3] "poly(a, b, c, d, degree = 2)2.0.0.0" "poly(a, b, c, d,
> degree =
> 2)0.1.0.0"
> #> [5] "poly(a, b, c, d, degree = 2)1.1.0.0" "poly(a, b, c, d,
> degree =
> 2)0.2.0.0"
> #> [7] "poly(a, b, c, d, degree = 2)0.0.1.0" "poly(a, b, c, d,
> degree =
> 2)1.0.1.0"
> #> [9] "poly(a, b, c, d, degree = 2)0.1.1.0" "poly(a, b, c, d,
> degree =
> 2)0.0.2.0"
> #> [11] "poly(a, b, c, d, degree = 2)0.0.0.1" "poly(a, b, c, d,
> degree =
> 2)1.0.0.1"
> #> [13] "poly(a, b, c, d, degree = 2)0.1.0.1" "poly(a, b, c, d,
> degree =
> 2)0.0.1.1"
> #> [15] "poly(a, b, c, d, degree = 2)0.0.0.2"
>
> # These colnames are nicer
> p <- with(X, poly(a, b, c, d, degree = 2L))
> attributes(p)
> #> $dim
> #> [1] 20 14
> #>
> #> $dimnames
> #> $dimnames[[1]]
> #> NULL
> #>
> #> $dimnames[[2]]
> #> 2 3 4 5 7 10
> 11
> 13
> #> "1.0.0.0" "2.0.0.0" "0.1.0.0" "1.1.0.0" "0.2.0.0" "0.0.1.0"
> "1.0.1.0"
> "0.1.1.0"
> #> 19 28 29 31 37 55
> #> "0.0.2.0" "0.0.0.1" "1.0.0.1" "0.1.0.1" "0.0.1.1" "0.0.0.2"
> #>
> #>
> #> $degree
> #> [1] 1 2 1 2 2 1 2 2 2 1 2 2 2 2
> #>
> #> $coefs
> #> $coefs[[1]]
> #> $coefs[[1]]$alpha
> #> [1] 0.1134201 0.4901752
> #>
> #> $coefs[[1]]$norm2
> #> [1] 1.00000 20.00000 34.11147 140.62467
> #>
> #>
> #> $coefs[[2]]
> #> $coefs[[2]]$alpha
> #> [1] 0.1356218 1.0041956
> #>
> #> $coefs[[2]]$norm2
> #> [1] 1.00000 20.00000 21.53021 39.73742
> #>
> #>
> #> $coefs[[3]]
> #> $coefs[[3]]$alpha
> #> [1] 0.2533081 0.1689801
> #>
> #> $coefs[[3]]$norm2
> #> [1] 1.00000 20.00000 19.08499 19.83333
> #>
> #>
> #> $coefs[[4]]
> #> $coefs[[4]]$alpha
> #> [1] -0.03597259 -0.87409789
> #>
> #> $coefs[[4]]$norm2
> #> [1] 1.00000 20.00000 24.68273 73.41893
> #>
> #>
> #>
> #> $class
> #> [1] "poly" "matrix"
>
> dimnames(p)
> #> [[1]]
> #> NULL
> #>
> #> [[2]]
> #> 2 3 4 5 7 10
> 11
> 13
> #> "1.0.0.0" "2.0.0.0" "0.1.0.0" "1.1.0.0" "0.2.0.0" "0.0.1.0"
> "1.0.1.0"
> "0.1.1.0"
> #> 19 28 29 31 37 55
> #> "0.0.2.0" "0.0.0.1" "1.0.0.1" "0.1.0.1" "0.0.1.1" "0.0.0.2"
>
>
> Hope this helps,
>
> Rui Barradas
>
>
More information about the R-help
mailing list