[BioC] edgeR: mixing technical replicates from Illumina HiSeq and MiSeq

Fri Aug 29 12:11:49 CEST 2014

Really nice approach ryan, I’ll keep it in mind :-)

Nick, I usually don’t use cutoffs for that no. If I’m unsure (e.g. the effect is not obvious or is minimal - i.e. on the technical variance on the PCA is much smaller than the biological one) I would conduct the analyses for the different approaches and look how they influences the results and then would select the more conservative approach. I know this sounds vague, but making this decision is frequently dependent on how the other samples behave. Hence, what we always do when we make our data and analysis public is that we also make the analysis code public; i.e. we use knit / pandoc to create an HTML document that details every decision we’ve made.

Nico

---------------------------------------------------------------
Nicolas Delhomme

The Street Lab
Department of Plant Physiology
Umeå Plant Science Center

Tel: +46 90 786 5478
Email: nicolas.delhomme at umu.se
SLU - Umeå universitet
Umeå S-901 87 Sweden
---------------------------------------------------------------

On 29 Aug 2014, at 11:29, Nick N <feralmedic at gmail.com> wrote:

> Thanks Ryan and Nicolas!
> 
> I was wondering whether there is some sort of decision tree that can be formalised.
> 
> Nicolas, you would consider 3 options - merging, ignoring or adding a factor. Could you recommend some sort of cut-offs for each choice or is it more of a qualitative decision by looking at plots and PCA? By the way, my data is RNA-Seq - I forgot to mention it.
> 
> Ryan, I would basically ask you the same question.
> 
> 
> On Fri, Aug 29, 2014 at 9:42 AM, Ryan <rct at thompsonclan.org> wrote:
> Hi Nick,
> 
> Thanks to the underlying theory behind dispersion estimation, you can easily test whether your "technical replicates" really do represent technical replicates. Specifically, read counts in technical replicates should follow a Poisson distribution, which is a special case of the negative binomial with zero dispersion. So, simply fit a model using edgeR or DESeq2 with a separate coefficient for each group of technical replicates. Thus all the experimental variation will be absorbed into the model coefficients and the only thing left will be the technical variability of of the replicates. For true technical replicates, the dispersion should be zero for all genes. So if you estimate dispersions using this model, and plotBCV/plotDispEsts shows the dispersion very near to zero, then you can be confident that you really have technical replicates. If the dispersion is nonzero, then there is some additional source of unaccounted-for variation.
> 
> I have used this method on a pilot dataset with several technical replicates for each condition. edgeR said the dispersion was something like 10^-3 or less for all genes except for the very low-expressed genes.
> 
> -Ryan
> 
> 
> On 8/28/14, 9:23 AM, Nick N wrote:
> Hi,
> 
> I have a study where a fraction of the samples have been replicated on 2
> Illumina platforms (HiSeq and Miseq). These are technical replicates - the
> library preparation is the same using the same biological replicates - it's
> only the sequencing which is different.
> 
> My hunch was that I shall introduce the platform as as an additional
> (blocking) factor in the analysis. Than I stumbled upon this post:
> 
> https://stat.ethz.ch/pipermail/bioconductor/2010-April/033099.html
> 
> It recommends pooling the replicates. The post seems to apply to a
> different case ("pure" technical replicates, i.e. no differences in the
> sequencing platform used) so I probably shall ignore it. But I still feel a
> bit uncertain of the best way to treat the technical replicates. Can you,
> please, advise me on this?
> 
> many thanks!
> Nick
> 
>         [[alternative HTML version deleted]]
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
> 
>