[R] Random Forest % Variation vs Psuedo-R^2?

Ryan Harrigan iluvsa at ucla.edu
Mon Jun 8 03:38:21 CEST 2009


Hi all (and Andy!),
    When running a randomForest run in R, I get the last part of an output
(with do.trace=T) that looks like this:

1993 |  0.04606   130.43 |
1994 |  0.04605   130.40 |
1995 |  0.04605   130.43 |
1996 |  0.04605   130.43 |
1997 |  0.04606   130.44 |
1998 |  0.04607   130.47 |
1999 |  0.04606   130.46 |
2000 |  0.04605   130.42 |

With the first column representing the iteration, the second column
representing the OOB MSE, and the last column representing the %Var(y). If I
calculate a "Psuedo-R^2" from these numbers, I would get;

1-(.04605/1.3042) = 0.965

Here's the question, if I look at the summary of forest.rf (this same run),
I get the following;

randomForest(formula = Prev ~ ., data = plas, ntree = 2000, importance =
TRUE, do.trace = T)
               Type of random forest: regression
                     Number of trees: 2000
No. of variables tried at each split: 5

          Mean of squared residuals: 0.04605177
                    % Var explained: -30.42

What does that -30.42 % Var explained relate to? I find it interesting that
the %Var(y) is 130.42, and that the %Var explained is a very similar number,
but have no idea how they are related. From my calculations, it seems like I
have a good predictor set (Psuedo R^2 over 95%), but am I missing something?

Cheers,

Ryan


--
Ryan Harrigan, Ph.D.
Center for Tropical Research
Institute of the Environment
University of California, Los Angeles
La Kretz Hall, Suite 300
Box 951496
Los Angeles, CA 90095-1496
203-804-9505
iluvsa at ucla.edu




More information about the R-help mailing list