[BioC] design model EdgeR

Gordon K Smyth smyth at wehi.EDU.AU
Thu Jul 5 03:28:28 CEST 2012


Dear Chris,

Your data doesn't contain enough information to estimate the linear model 
that you are trying to fit, hence the error from edgeR and the warning 
from limma.

The first thing to understand is that, in a paired design, the treatment 
factors are compared within pairs only.  So, in your experiment, the 
various levels of parent and allele are compared within each sample.

The basic problem is this.  Factors allele and parent each take two 
levels, so in principle there are four possible combinations of these 
factors.  However the combination A.father occurs only with B.mother, and 
A.mother occurs only with B.father.  Hence the only comparisons you can 
make are A.father vs B.mother and A.mother vs B.father.  In other words, 
there are only two independent quantities you can estimate.  It is 
impossible for example to estimate the difference between A.father and 
A.mother because these never occur together for the same sample.

In your linear model, you are asking the software to estimate three linear 
model coefficients to represent the differences between the different 
levels between allele and parent, when in reality only two are estimable. 
Specifically you are asking for parent effects within each allele.  But 
there is no way to estimate these quantities, so the software gives an 
error.

limma gives you an informative warning.  The code

   > nonEstimable(design)
   [1] "alleleB:parentmother"

tells you that you cannot estimate the parental mothervsfather comparison 
for alleleB.

It would seem that you need to make a deep decision.  Should you be 
reposing your scientific questions to focus on what can be estimated 
within samples?  Or should you be recovering information from your data by 
making comparisons between the different samples?  The latter can be done 
in principle using voom by treating sample as a random effect.

It's hard to advise you on scientific decisions of this sort that require 
a deep understanding of the science of your problem.  Can you consult a 
statistician at your university?  Show them this email.  The issues of 
setting up a design matrix in edgeR or limma are the same as for any 
univariate ANOVA type problem, so a good biostatistician should be able to 
give you reasonable advice on what you can estimate, even if they are not 
familiar with edgeR or limma or with genomic problems in general.

Hope this helps
Gordon

> Date: Mon, 2 Jul 2012 21:59:27 -0600
> From: chris_utah <chris.gregg at neuro.utah.edu>
> To: bioconductor at r-project.org Subject: 
> [BioC] design model EdgeR
>
> Hi,
>
> We are working on a glm analysis in EdgeR and voom() and I am hoping we 
> can help with our design.  Currently, we are getting errors.
>
> We have 18 RNA-Seq biological replicates (samples), each sample has two 
> columns of counts for each tag, one for the maternal allele and one for 
> the paternal allele.  Nine samples have mother A and father B, and nine 
> samples have mother B and father A.
>
> we wish to test the following models in edgeR and voom()
>
> (~sample+allele+parent) vs (~sample+allele) 
> (~sample+allele+allele:parent) vs (~sample+allele)
>
> Our design matrix is built as follows.  Once concern is that the two 
> columns for each sample are paired, so I give them the same sample ID.
>
> sample<-factor(c(1,1,2,2,3,3,4,4,5,5,6,6,7,7,8,8,9,9,10,10,11,11,12,12,13,13,14,14,15,15,16,16,17,17,18,18)) 
> #provides sample labels
> title<-c("A", "B")  #provides allele labels 
> allele<-factor(c(rep(title, 18)))
> parent1<-c("father", "mother") 
> #parent labels for the first nine replicates
> parent2<-c("mother", "father")  #parent labels for the second nine replicates 
> parent<-factor(c(c(rep(parent1, 9)), c(rep(parent2, 9))))
>
> model.matrix(~sample+allele+allele:parent)->design
>
> The errors we get with this approach are as follows:
>
> EdgeR
>> d <-estimateGLMCommonDisp(d, design)
> Error in solve.default(R, t(beta)) :
>  system is computationally singular: reciprocal condition number = 
> 3.227e-17
>
>
> voom()
>> design<-model.matrix(~sample+allele+allele:parent) y<-voom(ctx,design, 
>> plot=TRUE, lib.size=colSums(ctx)*nf)
> Coefficients not estimable: alleleA:parentmother Warning message: 
> Partial NA coefficients for 159907 probe(s)
>
> Thank you very much for any help!
>
> best wishes,
>
> Chris
>
>
>> sessionInfo()
> R version 2.15.0 (2012-03-30) Platform: x86_64-apple-darwin9.8.0/x86_64 
> (64-bit)
>
> locale: [1] 
> en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>
> attached base packages: [1] stats graphics grDevices utils datasets 
> methods base
>
> other attached packages: [1] edgeR_2.6.3 limma_3.12.0
>
> loaded via a namespace (and not attached): [1] annotate_1.34.0 
> AnnotationDbi_1.18.0 Biobase_2.16.0 BiocGenerics_0.2.0 DBI_0.2-5 
> DESeq_1.9.7 genefilter_1.38.0 geneplotter_1.34.0 grid_2.15.0 [10] 
> IRanges_1.14.0 RColorBrewer_1.0-5 RSQLite_0.11.1 splines_2.15.0 
> stats4_2.15.0 survival_2.36-12 xtable_1.7-0
>

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}



More information about the Bioconductor mailing list