[R] clarificatin on validate.ols method='cross'

Frank E Harrell Jr f.harrell at vanderbilt.edu
Mon Aug 31 14:55:36 CEST 2009


Dylan Beaudette wrote:
> Hi,
> 
> I was hoping to clarify the exact behavior associated with this incantation:
> 
> validate(fit.ols, method='cross', B=50)
> 
> Output:
> 
>           index.orig training    test optimism index.corrected  n
> R-square      0.5612   0.5613  0.5171   0.0442          0.5170 50
> MSE           1.3090   1.3086  1.3547  -0.0462          1.3552 50
> Intercept     0.0000   0.0000 -0.0040   0.0040         -0.0040 50
> Slope         1.0000   1.0000  0.9899   0.0101          0.9899 50
> 
> Questions:
> 1. Does this perform 50 replicate, 10-fold CV operations?

Type ?validate

You are leaving out 1/50th of the rows of the data each time the model 
is fit.

If your sample size is not huge, you may need to average multiple runs 
of cross-validation to get adequate precision.  The bootstrap is more 
efficient and a bit easier to do.

Note that if fit.ols was not a fully pre-specified model (e.g., if you 
did any variable selection) you are not using validate correctly and are 
getting biased estimates.

> 
> 2. What do the slope and intercept terms refer to?

Estimated slope of Xnew*BETAold in predicting Ynew, i.e. slope of the 
calibration (reliability) curves.  Likewise for the intercept.

> 
> 3. How can I interpret the 'test R2' ?

It is a nearly unbiased estimate of R^2 to assess the likely future 
performance of the model on new data from the same data stream.

Frank

> 
> 
> Thanks in advance!
> 
> Cheers,
> Dylan
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 


-- 
Frank E Harrell Jr   Professor and Chair           School of Medicine
                      Department of Biostatistics   Vanderbilt University




More information about the R-help mailing list