[BioC] Influence of expression correlation on false positive ratio

Kevin R. Coombes kevin.r.coombes at gmail.com
Wed Jul 11 18:22:34 CEST 2012


Hi Wolfgang,

It's not just technical artifacts.  Everyone believes (probably 
correctly) that gene expression in biological samples is in fact 
correlated, a fact that is exploited all the time when people run 
algorithms to try to (re)construct networks or pathways based on 
coexpression.  And while I agree that a truly multivariate approach 
would be more advisable, (a) there is no consensus on how best to do 
this and (b) it is not the current standard practice.  There are already 
gazillions of papers (and more are being written and published as I 
write  this email) that compute p-values from univariate gene-by-gene 
tests and follow with a method to estimate the FDR.

The operative word here is "estimate", which should make you think that 
there might be some uncertainty in the estimates.  We recently did some 
simulations to get an idea of how much the precision of the FDR 
estimates is affected by correlation.  We also point out a couple of 
examples from real data that suggest that the effect of correlation 
could be large.  The paper has been accepted at BMC Bioinformatics, so I 
can supply the advance URL for people who want more information:
http://www.biomedcentral.com/1471-2105/13/S13/S1/abstract

Best,
     Kevin

On 7/11/2012 7:22 AM, Wolfgang Huber wrote:
> January,
>
> if you only require per-gene p-values and no multiple testing 
> adjustment, then the dependency is never a problem. The validity of 
> unadjusted per-gene p-values is unaffected by whether there is 
> dependency between the genes.
>
> For multiple testing, if you do FWER by the Westfall-Young method, any 
> dependence is also no problem. If you do FDR by the Benjamini-Hochberg 
> method, problems can in principle occur if there is pervasive 
> dependence. Often this is caused by technical artifacts, which would 
> be addressed (and removed) by the methods mentioned by Jeff. If it is 
> biological, then a serial univariate analysis (gene-by-gene testing) 
> does not seem the cleverest choice of approach, and a truly 
> multivariate approach seems more advisable.
>
>     Best wishes
>     Wolfgang
>
>
> Jeff Leek scripsit 07/09/2012 01:17 PM:
>> Hi January,
>>
>> If the tests are only dependent in small groups, say because genes are
>> grouped into small modules,  then most FDR methods in the p.adjust()
>> function or the methods in the qvalue package will work. The Bonferroni
>> correction controls a more conservative error rate, but also holds under
>> dependence.
>>
>> If the sources of dependence are more pervasive, like if there are batch
>> effects:
>>
>> http://www.nature.com/nrg/journal/v11/n10/full/nrg2825.html
>>
>> Then you can either use the batch correction methods in Limma if, 
>> say, you
>> know the date the samples were processed. Or, if you don't know the 
>> sources
>> of large scale dependence, you can use the sva package:
>>
>> http://www.bioconductor.org/packages/devel/bioc/html/sva.html
>>
>> which implements the methods described here:
>>
>> http://www.pnas.org/content/early/2008/11/24/0808709105.abstract
>>
>>
>> Best,
>>
>>
>> Jeff
>>
>>
>>
>> On Jul 9, 2012 7:08 AM, "January Weiner" 
>> <january.weiner at mpiib-berlin.mpg.de>
>> wrote:
>>
>>> Hello,
>>>
>>> statistical methods for assessing significance of differences in
>>> expression assume, correct me if I'm wrong, independence of the tests.
>>> Does anyone have at hand any papers on the performance -- in terms of
>>> type I error -- of methods such as limma / eBayes? I'm sure this issue
>>> has been investigated in depth.
>>>
>>> Kind regards,
>>>
>>> January
>>>
>>> -- 
>>> -------- Dr. January Weiner 3 --------------------------------------
>>> Max Planck Institute for Infection Biology
>>> Charitéplatz 1
>>> D-10117 Berlin, Germany
>>> Web   : www.mpiib-berlin.mpg.de
>>> Tel     : +49-30-28460514
>>> Fax    : +49-30-28450505
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>>
>>     [[alternative HTML version deleted]]
>>
>>
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: 
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
>



More information about the Bioconductor mailing list