[R] Performance measure for probabilistic predictions

Frank E Harrell Jr f.harrell at vanderbilt.edu
Wed Aug 19 14:21:17 CEST 2009

Noah Silverman wrote:
> Hello,
> I'm using an SVM for predicting a model, but I'm most interested in the 
> probability output.  This is easy enough to calculate.
> My challenge is how to measure the relative performance of the SVM for 
> different settings/parameters/etc.
> An AUC curve comes to mind, but I'm NOT interested in predicting true vs 
> false.  I am interested in finding the most accurate probability 
> predictions possible.
> I've seen some literature where the probability range is cut into 
> segments and then the predicted probability is compared to the actual.  
> This looks nice, but I need a more tangible numeric measure.  One 
> thought was a measure of "probability accuracy" for each range, but how 
> to calculate this.
> Any thoughts?
> -N


This is a big area but I'm glad you are interested in probability 
accuracy rather than the more frequently (mis)-used classification 
accuracy.  There are many measures available.  For independent test 
samples the val.prob function in the Design package provides many.

When making a calibration plot to demonstrate absolute prediction 
accuracy, it is not a good idea to bin the predicted probabilities. 
val.prob uses loess to produce a smooth calibration curve.


> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Frank E Harrell Jr   Professor and Chair           School of Medicine
                      Department of Biostatistics   Vanderbilt University

More information about the R-help mailing list