[R] r code to generate interaction columns

Martin Maechler maechler at stat.math.ethz.ch
Wed Mar 10 12:18:05 CET 2010


>>>>> "k" == kMan  <kchamberln at gmail.com>
>>>>>     on Tue, 9 Mar 2010 19:52:40 -0700 writes:

    k> Dear Dhruv, Your clarification helps, and I'm
    k> stumped. Sorry I cannot be of more help.

    k> Sincerely, KeithC.

I'd say *The* answer is to use  model.matrix()

This allows to use R's powerful model formula language
and produce the 'model matrix' aka 'design matrix' X for you.

[ The Matrix package even contains a  sparse.model.matrix()
  function which can be useful for really largish problems.
  E.g., the glmnet package using Lasso-like methods {{instead of
	Randomforest; and Trevor Hastie has quite a host of examples
	where glmnet methods perform better than Randomforest.}}
  can make use of "Matrix sparse matrices" like that.
]

Read  help(model.matrix)
and also look at the examples there
which you can run in R by

   examples(model.matrix)

Regards,
Martin Maechler, ETH Zurich


    k> -----Original Message-----
    k> From: Sharma, Dhruv [mailto:Dhruv.Sharma at PenFed.org] 
    k> Sent: Monday, March 08, 2010 7:51 AM
    k> To: kMan; r-help at r-project.org
    k> Subject: RE: [R] r code to generate interaction columns


    k> thanks Kieth.  I wanted something generic code to check column data type
    k> and loop through and create the interaction columns automatically as I want
    k> to test this out as a new algorithm for data mining.

    k> Traditional regression may give misleading results with multi-collinearity
    k> and thus I wanted to take interaction terms and run them through random
    k> forests and rpart as they would need interaction terms to be manually
    k> created.

    k> Hope that clarifies.

    k> Dhruv

    k> -----Original Message-----
    k> From: kMan [mailto:kchamberln at gmail.com]
    k> Sent: Sunday, March 07, 2010 8:08 PM
    k> To: Sharma, Dhruv; r-help at r-project.org
    k> Subject: RE: [R] r code to generate interaction columns

    k> Dear Dhruv,

    k> You could create interaction variables manually (assuming A is your
    k> dependent variable). Just multiply the variables together.
    k> cd.int<-C*D
    k> ce.int<-C*E
    k> cde.int<-C*D*E # what about D*E, or interactions with B?
    k> Include those in your model, such as
    k> A~B+C+D+E+cd.int+cd.int+ce.int+cde.int.
    k> Then you can compare those models to the results you get when you specify
    k> the interaction in the model formula directly using the documented syntax.
    k> In your R-console, type ?formula, or help("formula") for details. 

    k> Sincerely,
    k> KeithC.


    k> -----Original Message-----
    k> From: Sharma, Dhruv [mailto:Dhruv.Sharma at PenFed.org]
    k> Sent: Saturday, March 06, 2010 10:30 AM
    k> To: r-help at r-project.org
    k> Subject: [R] r code to generate interaction columns

    k> Hi,
    k> is there a way to take a dataset and extract numeric columns and create
    k> interaction columns from it automatically?

    k> For e.g.  there are 5 columns of data: A,B,C,D,E.

    k> CDE are numeric.

    k> Can someone provide code to automatically create more columns such
    k> as:

    k> 1) C*D, C*E, C*D*E, (C+E)/(D+.01 (to avoid divide by zero),
    k> (D+E)/(C+.01 (to avoid divide by zero), (C+D)/(E+.01 (to avoid divide by
    k> zero))

    k> ?

    k> I know in glm multiplying can create terms but i want the columns to be part
    k> of the data set so that i can feed this into Random forest to pick out
    k> predictive interaction terms as regression cannot reliably handle correlated
    k> interaction terms.

    k> if anyone has some simple code that can do this that would be helpful.

    k> thanks
    k> Dhruv



More information about the R-help mailing list