[R] r code to generate interaction columns
Sharma, Dhruv
Dhruv.Sharma at PenFed.org
Wed Mar 10 15:37:23 CET 2010
Thanks Martin for the lead. A new avenue to explore.
Dhruv
-----Original Message-----
From: Martin Maechler [mailto:maechler at stat.math.ethz.ch]
Sent: Wednesday, March 10, 2010 6:18 AM
To: kMan
Cc: Sharma, Dhruv; r-help at r-project.org
Subject: Re: [R] r code to generate interaction columns
>>>>> "k" == kMan <kchamberln at gmail.com>
>>>>> on Tue, 9 Mar 2010 19:52:40 -0700 writes:
k> Dear Dhruv, Your clarification helps, and I'm
k> stumped. Sorry I cannot be of more help.
k> Sincerely, KeithC.
I'd say *The* answer is to use model.matrix()
This allows to use R's powerful model formula language and produce the
'model matrix' aka 'design matrix' X for you.
[ The Matrix package even contains a sparse.model.matrix()
function which can be useful for really largish problems.
E.g., the glmnet package using Lasso-like methods {{instead of
Randomforest; and Trevor Hastie has quite a host of examples
where glmnet methods perform better than Randomforest.}}
can make use of "Matrix sparse matrices" like that.
]
Read help(model.matrix)
and also look at the examples there
which you can run in R by
examples(model.matrix)
Regards,
Martin Maechler, ETH Zurich
k> -----Original Message-----
k> From: Sharma, Dhruv [mailto:Dhruv.Sharma at PenFed.org]
k> Sent: Monday, March 08, 2010 7:51 AM
k> To: kMan; r-help at r-project.org
k> Subject: RE: [R] r code to generate interaction columns
k> thanks Kieth. I wanted something generic code to check column
data type
k> and loop through and create the interaction columns automatically
as I want
k> to test this out as a new algorithm for data mining.
k> Traditional regression may give misleading results with
multi-collinearity
k> and thus I wanted to take interaction terms and run them through
random
k> forests and rpart as they would need interaction terms to be
manually
k> created.
k> Hope that clarifies.
k> Dhruv
k> -----Original Message-----
k> From: kMan [mailto:kchamberln at gmail.com]
k> Sent: Sunday, March 07, 2010 8:08 PM
k> To: Sharma, Dhruv; r-help at r-project.org
k> Subject: RE: [R] r code to generate interaction columns
k> Dear Dhruv,
k> You could create interaction variables manually (assuming A is
your
k> dependent variable). Just multiply the variables together.
k> cd.int<-C*D
k> ce.int<-C*E
k> cde.int<-C*D*E # what about D*E, or interactions with B?
k> Include those in your model, such as
k> A~B+C+D+E+cd.int+cd.int+ce.int+cde.int.
k> Then you can compare those models to the results you get when you
specify
k> the interaction in the model formula directly using the
documented syntax.
k> In your R-console, type ?formula, or help("formula") for details.
k> Sincerely,
k> KeithC.
k> -----Original Message-----
k> From: Sharma, Dhruv [mailto:Dhruv.Sharma at PenFed.org]
k> Sent: Saturday, March 06, 2010 10:30 AM
k> To: r-help at r-project.org
k> Subject: [R] r code to generate interaction columns
k> Hi,
k> is there a way to take a dataset and extract numeric columns and
create
k> interaction columns from it automatically?
k> For e.g. there are 5 columns of data: A,B,C,D,E.
k> CDE are numeric.
k> Can someone provide code to automatically create more columns
such
k> as:
k> 1) C*D, C*E, C*D*E, (C+E)/(D+.01 (to avoid divide by zero),
k> (D+E)/(C+.01 (to avoid divide by zero), (C+D)/(E+.01 (to avoid
divide by
k> zero))
k> ?
k> I know in glm multiplying can create terms but i want the columns
to be part
k> of the data set so that i can feed this into Random forest to
pick out
k> predictive interaction terms as regression cannot reliably handle
correlated
k> interaction terms.
k> if anyone has some simple code that can do this that would be
helpful.
k> thanks
k> Dhruv
More information about the R-help
mailing list