[BioC] edgeR - estimateGLMCommonDisp - warnings - huge logFC

Filippis, Ioannis i.filippis at imperial.ac.uk
Sat Jul 9 00:10:44 CEST 2011


Dear Gordon,

thank you very much for the reply.

The problem in using as common dispersion the dispersion among the samples, is that this dispersion is too high leading to 0 DE genes. And I expect to have such high diversion, it is not some artifact of the data.

I also wonder whether it is statistically wrong to add 1 to the counts matrix before any edgeR analysis. This would lead to "meaningfull" logFC values and logical volcano plots.

Many thanks for your help.

Best regards,
Ioannis
________________________________________
From: Gordon K Smyth [smyth at wehi.EDU.AU]
Sent: 08 July 2011 12:16
To: Filippis, Ioannis
Cc: Bioconductor mailing list
Subject: edgeR - estimateGLMCommonDisp - warnings - huge logFC

Dear Ioannis,

If the counts are zeros for some libraries for some genes, then it should
be no surprise that some of the logFC might be very large.  The raw fold
changes are infinite.

The real problem though is that running estimateGLMCommonDisp() without
replicates is meaningless, since the dispersion is not actually estimable
without replicates.  The function will probably just return a dispersion
of zero in this case.

If you must analyse RNA-Seq data without replicates, you could estimate
the dispersion very roughly by treating all the libraries as if they were
replicates, by

   d2 <- estimateCommonDisp(d), or
   d2 <- estimateGLMCommonDisp(d)

and then proceed using this conservative dispersion estimate.

Best wishes
Gordon

> Date: Fri, 8 Jul 2011 08:18:56 +0000
> From: "Filippis, Ioannis" <i.filippis at imperial.ac.uk>
> To: "bioconductor at r-project.org" <bioconductor at r-project.org>
> Subject: [BioC] edgeR - estimateGLMCommonDisp - warnings - huge logFC
> Content-Type: text/plain
>
> Hi,
>
> I am using edgeR for a 2x2 factorial design (Strain*Treatment) without
> any replicates and the estimateGLMCommonDisp and glmFit functions.
>
> When I run estimateGLMCommonDisp, I get warnings
> 1: In optimize(f = fun, interval = interval^0.25, y = y,  ... :
>  NA/Inf replaced by maximum positive value
> and when I run glmFit and then glmLRT, I get huge fold change values for some genes.
>
> However, if I do a pairwise exactTest for the samples examined for the above contrast, the fold change for that genes is high but normal.
>
> I would really appreciate any feedback on the cause of warnings and huge logFC.
>
> Many thanks for your help.
>
> Best,
> Ioannis

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:6}}



More information about the Bioconductor mailing list