[BioC] Seeking assistance on ROC

Sean Davis seandavi at gmail.com
Mon Jan 25 19:54:50 CET 2010


On Sat, Jan 23, 2010 at 6:28 AM, Susan Bosco <susanbosco86 at yahoo.com> wrote:
> Dear Sean,
>
> Thanks again.
>
> I corrected the script changing the value of 'truth' variable with rbinom() function. Since my data size is quite large(data is of 244K),I tried with the first 200,for which I was able to find proper ROC curve. However, when I include the complete data, the plot changes. For the whole data,I get
>  a linear graph with small variations.
>
> My sessionInfo() looks like this:
> For 100 values of the data:
> library(ROC)
> load("RGKma.RData")
> state= rbinom(length(RGKma$M[1:100,3]),1,0.33)
> data = RGKma$M[1:200,3]
> R1<-rocdemo.sca(truth=state,data,dxrule.sca)
> pdf("ROCk.pdf")
> plot(R1, show.thresh=TRUE,col = "red")
> dev.off()
>
> For the complete data:
> library(ROC)
> load("RGKma.RData")
> state= rbinom(length(RGKma$M[,3]),1,0.33)
> data = RGKma$M[,3]
> R1<-rocdemo.sca(truth=state,data,dxrule.sca)
> pdf("ROCallk.pdf")
> plot(R1, show.thresh=TRUE,col = "red")
> dev.off()
>
> I've hereby attached the pdfs of the plots.I would appreciate if you could help me out with this problem that I encountered with a large data size.

Hi, Susan.  The problem is not the large data size, in particular.
You need to know the TRUTH.  You cannot assign the TRUTH using a
random binomial.  You need to KNOW which samples are of one class
versus the other.  Do you know that information?  If not, then ROC
analysis is not a useful thing to apply.

Sean

> Thanking you sincerely,
> Susan.
>
>
> --- On Wed, 20/1/10, Sean Davis
>  <seandavi at gmail.com> wrote:
>
> From: Sean Davis
>  <seandavi at gmail.com>
> Subject: Re: [BioC] Seeking assistance on ROC
> To: "Susan Bosco" <susanbosco86 at yahoo.com>
> Cc: bioconductor at stat.math.ethz.ch, "prashantha hebbar" <prashantha.hebbar at manipal.edu>
> Date: Wednesday, 20 January, 2010, 12:05 PM
>
>
>
> On Wed, Jan 20, 2010 at 12:39 AM, Susan Bosco <susanbosco86 at yahoo.com> wrote:
>
>
> Dear
>  Sean,
>
> Thank you so much for  the help.
>
>
> I tried with a range of thresholds from 0-0.9..As you had mentioned,the
> true positive rates no doubt increased with thresholds below
> 0.9.However I did get some false positive rates even at a minimum threshold
> of 0.1.Could you kindly explain the reason?
>
>
>
> Is
> there any method of finding the optimal threshold,maximizing the true
> positive rates while minimizing the false positives,instead of randomly
> choosing between 0-0.9?
>
>
> Hi, Susan.  The ROC curve IS that method.  The ROC curve represents ALL thresholds as applied to the data.  If you plot with show.thresh=TRUE, you will see the thresholds that were tried and where they are on the curve.
>
>
> If the threshold to which you are referring is the one that you used to determine the variable you called "state", then we are talking about two different things.  The "truth" variable is meant to be assigned by some source other than the data themselves.  If you do not know the true state of your samples and find yourself assigning the state the data, then ROC curve analysis will not be of any use.
>
>
> Sean
>
>
> Thanks in advance,
>
> Susan.
>
>
>
>
>
>
> The INTERNET now has a personality. YOURS! See your Yahoo! Homepage.
>
>
>
>
>      Your Mail works best with the New Yahoo Optimized IE8. Get it NOW! http://downloads.yahoo.com/in/internetexplorer/
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>



More information about the Bioconductor mailing list