[R] binning results
Noah Silverman
noah at smartmediacorp.com
Wed Aug 5 20:11:05 CEST 2009
Hello,
I asked this as part of a previous message, but never really figured out
a usable solution. So this is a second attempt.
I have an process containing an SVM. The end result is the probability
that the class is true. That result is added back to the original data.
So I wind up with a data.frame that looks like this
label,v1,v2,v3,prob_true
What I want to do is measure how accurate my model is for each range of
probability. (I've seen this done is a few published papers and found
it a very useful way to visualize things.)
My hope/guess is that there is some kind of package for R that does this
since it should be a common need.
Here is an example of what I'd like to be able to generate:
range number of items mean(probability) true_accuracy
100-90% 20 .924 .90
90-80% 50 .825 .84
80-70% 214 .75 .71
etc...
range is the range of predicted values by the SVM
mean(probability) is the mean of the PREDICTED probability of items in
that range
true_accuracy is the mean of the ACTUAL probability of items in that range.
In English I would explain it as, "Of the data where our SVM predicted a
true probability of 70-80%, the data was actually 71% true."
It might be really helpful to be able to graph this somehow. (Again,
There must be some package in R for this??)
With mean(predicted_probability) on one axis and mean(true_probability)
on the other axis.
Any thoughts, comments, ideas, etc. would be appreciated!
Thank You
More information about the R-help
mailing list