[R] r code to generate interaction columns

Sharma, Dhruv Dhruv.Sharma at PenFed.org
Wed Mar 10 15:37:23 CET 2010


Thanks Martin for the lead.  A new avenue to explore.

Dhruv 

-----Original Message-----
From: Martin Maechler [mailto:maechler at stat.math.ethz.ch] 
Sent: Wednesday, March 10, 2010 6:18 AM
To: kMan
Cc: Sharma, Dhruv; r-help at r-project.org
Subject: Re: [R] r code to generate interaction columns

>>>>> "k" == kMan  <kchamberln at gmail.com>
>>>>>     on Tue, 9 Mar 2010 19:52:40 -0700 writes:

    k> Dear Dhruv, Your clarification helps, and I'm
    k> stumped. Sorry I cannot be of more help.

    k> Sincerely, KeithC.

I'd say *The* answer is to use  model.matrix()

This allows to use R's powerful model formula language and produce the
'model matrix' aka 'design matrix' X for you.

[ The Matrix package even contains a  sparse.model.matrix()
  function which can be useful for really largish problems.
  E.g., the glmnet package using Lasso-like methods {{instead of
	Randomforest; and Trevor Hastie has quite a host of examples
	where glmnet methods perform better than Randomforest.}}
  can make use of "Matrix sparse matrices" like that.
]

Read  help(model.matrix)
and also look at the examples there
which you can run in R by

   examples(model.matrix)

Regards,
Martin Maechler, ETH Zurich


    k> -----Original Message-----
    k> From: Sharma, Dhruv [mailto:Dhruv.Sharma at PenFed.org] 
    k> Sent: Monday, March 08, 2010 7:51 AM
    k> To: kMan; r-help at r-project.org
    k> Subject: RE: [R] r code to generate interaction columns


    k> thanks Kieth.  I wanted something generic code to check column
data type
    k> and loop through and create the interaction columns automatically
as I want
    k> to test this out as a new algorithm for data mining.

    k> Traditional regression may give misleading results with
multi-collinearity
    k> and thus I wanted to take interaction terms and run them through
random
    k> forests and rpart as they would need interaction terms to be
manually
    k> created.

    k> Hope that clarifies.

    k> Dhruv

    k> -----Original Message-----
    k> From: kMan [mailto:kchamberln at gmail.com]
    k> Sent: Sunday, March 07, 2010 8:08 PM
    k> To: Sharma, Dhruv; r-help at r-project.org
    k> Subject: RE: [R] r code to generate interaction columns

    k> Dear Dhruv,

    k> You could create interaction variables manually (assuming A is
your
    k> dependent variable). Just multiply the variables together.
    k> cd.int<-C*D
    k> ce.int<-C*E
    k> cde.int<-C*D*E # what about D*E, or interactions with B?
    k> Include those in your model, such as
    k> A~B+C+D+E+cd.int+cd.int+ce.int+cde.int.
    k> Then you can compare those models to the results you get when you
specify
    k> the interaction in the model formula directly using the
documented syntax.
    k> In your R-console, type ?formula, or help("formula") for details.


    k> Sincerely,
    k> KeithC.


    k> -----Original Message-----
    k> From: Sharma, Dhruv [mailto:Dhruv.Sharma at PenFed.org]
    k> Sent: Saturday, March 06, 2010 10:30 AM
    k> To: r-help at r-project.org
    k> Subject: [R] r code to generate interaction columns

    k> Hi,
    k> is there a way to take a dataset and extract numeric columns and
create
    k> interaction columns from it automatically?

    k> For e.g.  there are 5 columns of data: A,B,C,D,E.

    k> CDE are numeric.

    k> Can someone provide code to automatically create more columns
such
    k> as:

    k> 1) C*D, C*E, C*D*E, (C+E)/(D+.01 (to avoid divide by zero),
    k> (D+E)/(C+.01 (to avoid divide by zero), (C+D)/(E+.01 (to avoid
divide by
    k> zero))

    k> ?

    k> I know in glm multiplying can create terms but i want the columns
to be part
    k> of the data set so that i can feed this into Random forest to
pick out
    k> predictive interaction terms as regression cannot reliably handle
correlated
    k> interaction terms.

    k> if anyone has some simple code that can do this that would be
helpful.

    k> thanks
    k> Dhruv



More information about the R-help mailing list