[BioC] design model EdgeR
Gordon K Smyth
smyth at wehi.EDU.AU
Thu Jul 5 03:28:28 CEST 2012
Dear Chris,
Your data doesn't contain enough information to estimate the linear model
that you are trying to fit, hence the error from edgeR and the warning
from limma.
The first thing to understand is that, in a paired design, the treatment
factors are compared within pairs only. So, in your experiment, the
various levels of parent and allele are compared within each sample.
The basic problem is this. Factors allele and parent each take two
levels, so in principle there are four possible combinations of these
factors. However the combination A.father occurs only with B.mother, and
A.mother occurs only with B.father. Hence the only comparisons you can
make are A.father vs B.mother and A.mother vs B.father. In other words,
there are only two independent quantities you can estimate. It is
impossible for example to estimate the difference between A.father and
A.mother because these never occur together for the same sample.
In your linear model, you are asking the software to estimate three linear
model coefficients to represent the differences between the different
levels between allele and parent, when in reality only two are estimable.
Specifically you are asking for parent effects within each allele. But
there is no way to estimate these quantities, so the software gives an
error.
limma gives you an informative warning. The code
> nonEstimable(design)
[1] "alleleB:parentmother"
tells you that you cannot estimate the parental mothervsfather comparison
for alleleB.
It would seem that you need to make a deep decision. Should you be
reposing your scientific questions to focus on what can be estimated
within samples? Or should you be recovering information from your data by
making comparisons between the different samples? The latter can be done
in principle using voom by treating sample as a random effect.
It's hard to advise you on scientific decisions of this sort that require
a deep understanding of the science of your problem. Can you consult a
statistician at your university? Show them this email. The issues of
setting up a design matrix in edgeR or limma are the same as for any
univariate ANOVA type problem, so a good biostatistician should be able to
give you reasonable advice on what you can estimate, even if they are not
familiar with edgeR or limma or with genomic problems in general.
Hope this helps
Gordon
> Date: Mon, 2 Jul 2012 21:59:27 -0600
> From: chris_utah <chris.gregg at neuro.utah.edu>
> To: bioconductor at r-project.org Subject:
> [BioC] design model EdgeR
>
> Hi,
>
> We are working on a glm analysis in EdgeR and voom() and I am hoping we
> can help with our design. Currently, we are getting errors.
>
> We have 18 RNA-Seq biological replicates (samples), each sample has two
> columns of counts for each tag, one for the maternal allele and one for
> the paternal allele. Nine samples have mother A and father B, and nine
> samples have mother B and father A.
>
> we wish to test the following models in edgeR and voom()
>
> (~sample+allele+parent) vs (~sample+allele)
> (~sample+allele+allele:parent) vs (~sample+allele)
>
> Our design matrix is built as follows. Once concern is that the two
> columns for each sample are paired, so I give them the same sample ID.
>
> sample<-factor(c(1,1,2,2,3,3,4,4,5,5,6,6,7,7,8,8,9,9,10,10,11,11,12,12,13,13,14,14,15,15,16,16,17,17,18,18))
> #provides sample labels
> title<-c("A", "B") #provides allele labels
> allele<-factor(c(rep(title, 18)))
> parent1<-c("father", "mother")
> #parent labels for the first nine replicates
> parent2<-c("mother", "father") #parent labels for the second nine replicates
> parent<-factor(c(c(rep(parent1, 9)), c(rep(parent2, 9))))
>
> model.matrix(~sample+allele+allele:parent)->design
>
> The errors we get with this approach are as follows:
>
> EdgeR
>> d <-estimateGLMCommonDisp(d, design)
> Error in solve.default(R, t(beta)) :
> system is computationally singular: reciprocal condition number =
> 3.227e-17
>
>
> voom()
>> design<-model.matrix(~sample+allele+allele:parent) y<-voom(ctx,design,
>> plot=TRUE, lib.size=colSums(ctx)*nf)
> Coefficients not estimable: alleleA:parentmother Warning message:
> Partial NA coefficients for 159907 probe(s)
>
> Thank you very much for any help!
>
> best wishes,
>
> Chris
>
>
>> sessionInfo()
> R version 2.15.0 (2012-03-30) Platform: x86_64-apple-darwin9.8.0/x86_64
> (64-bit)
>
> locale: [1]
> en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>
> attached base packages: [1] stats graphics grDevices utils datasets
> methods base
>
> other attached packages: [1] edgeR_2.6.3 limma_3.12.0
>
> loaded via a namespace (and not attached): [1] annotate_1.34.0
> AnnotationDbi_1.18.0 Biobase_2.16.0 BiocGenerics_0.2.0 DBI_0.2-5
> DESeq_1.9.7 genefilter_1.38.0 geneplotter_1.34.0 grid_2.15.0 [10]
> IRanges_1.14.0 RColorBrewer_1.0-5 RSQLite_0.11.1 splines_2.15.0
> stats4_2.15.0 survival_2.36-12 xtable_1.7-0
>
______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}
More information about the Bioconductor
mailing list