[R] Random Forests: Predictor importance for Regression Trees
Dimitri Liakhovitski
ld7631 at gmail.com
Mon Apr 20 20:35:50 CEST 2009
Hello!
I think I am relatively clear on how predictor importance (the first
one) is calculated by Random Forests for a Classification tree:
Importance of predictor P1 when the response variable is categorical:
1. For out-of-bag (oob) cases, randomly permute their values on
predictor P1 and then put them down the tree
2. For a given tree, subtract the number of votes for the correct
class in the predictor-P1-permuted oob dataset from the number of
votes for the correct class in the untouched oob dataset: if P1 is
important, this number will be large.
3. The average of this number over all trees in the forest is the raw
importance score for predictor P1.
I am wondering what step 2 above looks like if the response variable
is continous and not categorical, in other words - for a Regression
tree. Could you please correct if what I wrote below is wrong? Thank
you very much!
Importance of predictor P1 when the response variable is continous:
1. For out-of-bag (oob) cases, randomly permute their values on
predictor P1 and then put them down the tree
2. For a given tree, calculate mean squared deviation of observed y
minus predicted y for (a) the untouched oob dataset and for (b) the
predictor-P1-permuted oob dataset. Subtract (a) from (b).
3. The average of this number over all trees in the forest is the raw
importance score for predictor P1.
--
Dimitri Liakhovitski
MarketTools, Inc.
Dimitri.Liakhovitski at markettools.com
More information about the R-help
mailing list