[R] mx2 contingency tables or (2^(m-1)-1)'s 2x2 contingency tables in the context of feature selection for random forest
Weiwei Shi
helprhelp at gmail.com
Thu Sep 28 19:52:03 CEST 2006
Dear Listers:
I have a categorical feature selection problem for random forest.
Suppose I have a multiple-leveled category variable A, which has m=3
levels: red, green, and blue and the final target is binary
classification.
I want to evaluate its power in discrimination between 2 classes. We
know rf splits multiple-leveled category variable by considering all
combinations of its levels. So suppose again I have 1000 such
multiple-leveled category variables and I need to do some feature
selection. Then I would like to try chi-sqr tests (or information
gain).
To match the splitting method used in rf, I am thinking if I should
simply use mx2 contingency table or (2^(m-1)-1)'s 2x2 contingency
tables in which I pick the best p-value to evaluate A's power. For the
latter, I am sure it is very alike the way used in rf. But is the
former good enough?
Thanks.
--
Weiwei Shi, Ph.D
Research Scientist
GeneGO, Inc.
"Did you always know?"
"No, I did not. But I believed..."
---Matrix III
More information about the R-help
mailing list