[BioC] edgeR and DESeq2: model design and estimation of dispersion

Mon Jun 16 05:30:59 CEST 2014

Dear Iddo,

No, it is not valid to use a different design matrix for the dispersion 
estimation.

edgeR will handle your model with 400 samples, but it will admitedly be 
slow. If this is too slow, then switch to voom() in the limma package, 
which will be very fast, or to glmQLFTest() in the edgeR package, which 
will still be relatively slow but faster than the glm routines in edgeR 
(or DESeq2).

Best wishes
Gordon

> From: Iddo Ben-dov <iddobe at ekmd.huji.ac.il>
> Subject: edgeR and DESeq2: model design and estimation of dispersion
> Date: June 12, 2014 at 4:51:51 PM GMT+3
> To: bioconductor at r-project.org
> 
> hi,
> 
> in both edgeR and DESeq2, estimation of dispersion precedes negative 
> binomial GLM fitting.
> 
> my question is, can I use a design formula when estimating dispersion 
> which is different from the formula used for GLM fitting? specifically, 
> I would like to use a simplified design when estimating dispersion and a 
> full design for GLM fitting.
> 
> my motivation for doing so is that with the full design estimation of 
> dispersion is too demanding for my computer and time.
> 
> my dataset includes 400 mRNAseq profiles (~22,000 genes). there are 100 
> controls and 100 cases, and each was sampled twice - before and after 
> intervention.
> 
> thus, the full design is:
> ~ group*intervention + individual:group (blocking factor)
> 
> as I mentioned, estimation of dispersion with the above design is not 
> practical, and I thus would like to simplify to: ~ group*intervention
> 
> and introduce the 'individual' blocking factor only for NB GLM fitting.
> 
> is this statistically valid?
> 
> appreciate any help,
> iddo

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}