[R] clarificatin on validate.ols method='cross'
Frank E Harrell Jr
f.harrell at vanderbilt.edu
Mon Aug 31 14:55:36 CEST 2009
Dylan Beaudette wrote:
> Hi,
>
> I was hoping to clarify the exact behavior associated with this incantation:
>
> validate(fit.ols, method='cross', B=50)
>
> Output:
>
> index.orig training test optimism index.corrected n
> R-square 0.5612 0.5613 0.5171 0.0442 0.5170 50
> MSE 1.3090 1.3086 1.3547 -0.0462 1.3552 50
> Intercept 0.0000 0.0000 -0.0040 0.0040 -0.0040 50
> Slope 1.0000 1.0000 0.9899 0.0101 0.9899 50
>
> Questions:
> 1. Does this perform 50 replicate, 10-fold CV operations?
Type ?validate
You are leaving out 1/50th of the rows of the data each time the model
is fit.
If your sample size is not huge, you may need to average multiple runs
of cross-validation to get adequate precision. The bootstrap is more
efficient and a bit easier to do.
Note that if fit.ols was not a fully pre-specified model (e.g., if you
did any variable selection) you are not using validate correctly and are
getting biased estimates.
>
> 2. What do the slope and intercept terms refer to?
Estimated slope of Xnew*BETAold in predicting Ynew, i.e. slope of the
calibration (reliability) curves. Likewise for the intercept.
>
> 3. How can I interpret the 'test R2' ?
It is a nearly unbiased estimate of R^2 to assess the likely future
performance of the model on new data from the same data stream.
Frank
>
>
> Thanks in advance!
>
> Cheers,
> Dylan
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Frank E Harrell Jr Professor and Chair School of Medicine
Department of Biostatistics Vanderbilt University
More information about the R-help
mailing list