[R] Random Forest % Variation vs Psuedo-R^2?
Liaw, Andy
andy_liaw at merck.com
Mon Jun 8 15:45:58 CEST 2009
It actually means that the MSE (0.04605) is 130.42% of var(y), thus the
model had not provided any better explanatory power than predicting by
mean(y). The pseudo R^2 is just 100% - 130.42% = -30.42%. Remember
that this is not the resubstituttion estimate because it is computed
from the OOB estimate of MSE.
HTH,
Andy
> -----Original Message-----
> From: r-help-bounces at r-project.org
> [mailto:r-help-bounces at r-project.org] On Behalf Of Ryan Harrigan
> Sent: Sunday, June 07, 2009 9:38 PM
> To: r-help at r-project.org
> Subject: [R] Random Forest % Variation vs Psuedo-R^2?
>
> Hi all (and Andy!),
> When running a randomForest run in R, I get the last part
> of an output
> (with do.trace=T) that looks like this:
>
> 1993 | 0.04606 130.43 |
> 1994 | 0.04605 130.40 |
> 1995 | 0.04605 130.43 |
> 1996 | 0.04605 130.43 |
> 1997 | 0.04606 130.44 |
> 1998 | 0.04607 130.47 |
> 1999 | 0.04606 130.46 |
> 2000 | 0.04605 130.42 |
>
> With the first column representing the iteration, the second column
> representing the OOB MSE, and the last column representing
> the %Var(y). If I
> calculate a "Psuedo-R^2" from these numbers, I would get;
>
> 1-(.04605/1.3042) = 0.965
>
> Here's the question, if I look at the summary of forest.rf
> (this same run),
> I get the following;
>
> randomForest(formula = Prev ~ ., data = plas, ntree = 2000,
> importance =
> TRUE, do.trace = T)
> Type of random forest: regression
> Number of trees: 2000
> No. of variables tried at each split: 5
>
> Mean of squared residuals: 0.04605177
> % Var explained: -30.42
>
> What does that -30.42 % Var explained relate to? I find it
> interesting that
> the %Var(y) is 130.42, and that the %Var explained is a very
> similar number,
> but have no idea how they are related. From my calculations,
> it seems like I
> have a good predictor set (Psuedo R^2 over 95%), but am I
> missing something?
>
> Cheers,
>
> Ryan
>
>
> --
> Ryan Harrigan, Ph.D.
> Center for Tropical Research
> Institute of the Environment
> University of California, Los Angeles
> La Kretz Hall, Suite 300
> Box 951496
> Los Angeles, CA 90095-1496
> 203-804-9505
> iluvsa at ucla.edu
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
Notice: This e-mail message, together with any attachme...{{dropped:12}}
More information about the R-help
mailing list