[R] r code to generate interaction columns
Martin Maechler
maechler at stat.math.ethz.ch
Wed Mar 10 12:18:05 CET 2010
>>>>> "k" == kMan <kchamberln at gmail.com>
>>>>> on Tue, 9 Mar 2010 19:52:40 -0700 writes:
k> Dear Dhruv, Your clarification helps, and I'm
k> stumped. Sorry I cannot be of more help.
k> Sincerely, KeithC.
I'd say *The* answer is to use model.matrix()
This allows to use R's powerful model formula language
and produce the 'model matrix' aka 'design matrix' X for you.
[ The Matrix package even contains a sparse.model.matrix()
function which can be useful for really largish problems.
E.g., the glmnet package using Lasso-like methods {{instead of
Randomforest; and Trevor Hastie has quite a host of examples
where glmnet methods perform better than Randomforest.}}
can make use of "Matrix sparse matrices" like that.
]
Read help(model.matrix)
and also look at the examples there
which you can run in R by
examples(model.matrix)
Regards,
Martin Maechler, ETH Zurich
k> -----Original Message-----
k> From: Sharma, Dhruv [mailto:Dhruv.Sharma at PenFed.org]
k> Sent: Monday, March 08, 2010 7:51 AM
k> To: kMan; r-help at r-project.org
k> Subject: RE: [R] r code to generate interaction columns
k> thanks Kieth. I wanted something generic code to check column data type
k> and loop through and create the interaction columns automatically as I want
k> to test this out as a new algorithm for data mining.
k> Traditional regression may give misleading results with multi-collinearity
k> and thus I wanted to take interaction terms and run them through random
k> forests and rpart as they would need interaction terms to be manually
k> created.
k> Hope that clarifies.
k> Dhruv
k> -----Original Message-----
k> From: kMan [mailto:kchamberln at gmail.com]
k> Sent: Sunday, March 07, 2010 8:08 PM
k> To: Sharma, Dhruv; r-help at r-project.org
k> Subject: RE: [R] r code to generate interaction columns
k> Dear Dhruv,
k> You could create interaction variables manually (assuming A is your
k> dependent variable). Just multiply the variables together.
k> cd.int<-C*D
k> ce.int<-C*E
k> cde.int<-C*D*E # what about D*E, or interactions with B?
k> Include those in your model, such as
k> A~B+C+D+E+cd.int+cd.int+ce.int+cde.int.
k> Then you can compare those models to the results you get when you specify
k> the interaction in the model formula directly using the documented syntax.
k> In your R-console, type ?formula, or help("formula") for details.
k> Sincerely,
k> KeithC.
k> -----Original Message-----
k> From: Sharma, Dhruv [mailto:Dhruv.Sharma at PenFed.org]
k> Sent: Saturday, March 06, 2010 10:30 AM
k> To: r-help at r-project.org
k> Subject: [R] r code to generate interaction columns
k> Hi,
k> is there a way to take a dataset and extract numeric columns and create
k> interaction columns from it automatically?
k> For e.g. there are 5 columns of data: A,B,C,D,E.
k> CDE are numeric.
k> Can someone provide code to automatically create more columns such
k> as:
k> 1) C*D, C*E, C*D*E, (C+E)/(D+.01 (to avoid divide by zero),
k> (D+E)/(C+.01 (to avoid divide by zero), (C+D)/(E+.01 (to avoid divide by
k> zero))
k> ?
k> I know in glm multiplying can create terms but i want the columns to be part
k> of the data set so that i can feed this into Random forest to pick out
k> predictive interaction terms as regression cannot reliably handle correlated
k> interaction terms.
k> if anyone has some simple code that can do this that would be helpful.
k> thanks
k> Dhruv
More information about the R-help
mailing list