[BioC] tagwise parameters for negative binomial distribution in edgeR

Fri Mar 21 09:36:27 CET 2014

On 21/mar/2014, at 01:47, Gordon K Smyth <smyth at wehi.EDU.AU> wrote:
>> 
>> Mmm, actually I would like to identify the sample that is an outlier for 
>> a specific gene, that's why I thought I could focus on tagwise 
>> distribution.
> 
> See Mark Robinson's post.
> 
> It depends on your purpose however.  Do you want to downweight/ignore 
> outliers, or do you want to identify them because they are interesting?

In this case outliers may be relevant, especially the less represented. I'm running the estimateGLMRobustDisp approach (although it takes a loong time)

>>> Any tag with a small prior.df is considered an outlier.  You can sort tags
>>> by their prior.df values to select the most significant outliers.
>> 
>> Does this identify a tag that is an outlier over all samples?
> 
> Basically yes.  We distinguish dispersion outliers and observation 
> outliers.  An observation outlier is an individual count that is an 
> outlier (relative to other counts for the same gene).  A dispersion 
> outlier is a gene that shows much more variability between replicates than 
> other genes at the same cpm level.  A dispersion outlier may arise from 
> one or more observation outliers, but not necessarily.  It could also 
> arise from systematically larger variability.

Thanks for the explanation.
Best, 

d