[BioC] DESeq2 extracting results in version >= 1.3 (was: "using DESeq2 with multi factor data")

Michael Love michaelisaiahlove at gmail.com
Mon Mar 31 15:41:50 CEST 2014


hi Karen and Olga,


On Mon, Mar 31, 2014 at 2:42 AM, Karen Chait <kchait at tx.technion.ac.il> wrote:
> Hi Michael,
> Thank you for your great help and your fast replies.
> When I define 2 factors when each has only 2 levels I understand I can't use the expanded matrix. I want to perform the following analysis-
>> colData=data.frame(row.names=colnames(tmpcountData),metas=metasVector,time=timeVector)
>> colData$metas=factor(colData$metas,levels=c("met_no","met_yes"))
>> colData$time=factor(colData$time,levels=c("1","2"))
>> dds2=DESeqDataSetFromMatrix(countData=tmpcountData,colData=colData,design=~time+metas+metas:time)
>> dds2=DESeq(dds2)
>> resultsNames(dds2)
> [1] "Intercept"               "time_2_vs_1"             "metas_met_yes_vs_met_no" "time2.metasmet_yes"
>
> The comparisons I want to run are :
> 1. all samples met_yes vs. met_no
> 2. Samples with time=1 met_yes vs. met_no
> 3. Samples with time=2 met_yes vs. met_no
>

We use standard design matrices for the 2x2 case because it simplifies
the extraction of results. To test for interaction effects (is the
effect of metastasis different at time 2 than time 1) we can simply
call the single interaction term using 'name'.

Concerning the comparisons you define above:

The second comparison is:

results(dds, contrast=c("metas","met_yes","met_no"))
# equivalent to
results(dds, contrast=c(0,0,1,0))

The third comparison is the main effect of metas_met_yes_vs_met_no
plus the interaction effect at time=2:

results(dds, contrast=list(c("metas_met_yes_vs_met_no",
"time2.metasmet_yes"), character()))
# equivalent to
results(dds, contrast=c(0,0,1,1))

The first comparison is the main effect of metas_met_yes_vs_met_no
plus half the interaction effect at time=2. This is somewhat intuitive
given the two comparisons we just described.

results(dds, contrast=c(0,0,1,1/2))



> I get the same results for comparisons 1 and 2
>> res1=results(dds,contrast=c("metas","met_yes","met_no"),alpha=0.05)
>> res2=results(dds,contrast=c(0,0,1,0),alpha=0.05)
>> res3=results(dds,contrast=c(0,0,1,1),alpha=0.05)
>
> I understand that for the first comparison I can define the design without the time factor.

You could do this, or use the contrast I defined above.

> For the second comparison, does DESeq calculate the differential expression only between the samples with time=1?

Yes.

-Mike

> If not, can you explain how DESeq2 calculates the DE for these comparisons.
>
>> sessionInfo()
> R version 3.0.3 (2014-03-06)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] parallel  stats     graphics  grDevices utils     datasets  methods
> [8] base
>
> other attached packages:
> [1] DESeq2_1.3.60             RcppArmadillo_0.4.100.2.1
> [3] Rcpp_0.11.1               GenomicRanges_1.14.4
> [5] XVector_0.2.0             IRanges_1.20.7
> [7] BiocGenerics_0.8.0
>
> loaded via a namespace (and not attached):
>  [1] annotate_1.40.1      AnnotationDbi_1.24.0 Biobase_2.22.0
>  [4] DBI_0.2-7            genefilter_1.44.0    geneplotter_1.40.0
>  [7] grid_3.0.3           lattice_0.20-27      locfit_1.5-9.1
> [10] RColorBrewer_1.0-5   RSQLite_0.11.4       splines_3.0.3
> [13] stats4_3.0.3         survival_2.37-7      tools_3.0.3
> [16] XML_3.98-1.1         xtable_1.7-3
>
>
> Thank you so much,
> Karen and Olga
>
>
> -----Original Message-----
> From: Michael Love [mailto:michaelisaiahlove at gmail.com]
> Sent: Monday, March 24, 2014 5:24 PM
> To: solgakar at bi.technion.ac.il; karen chait (kchait at tx.technion.ac.il)
> Cc: bioconductor at r-project.org
> Subject: Re: DESeq2 extracting results in version >= 1.3 (was: "using DESeq2 with multi factor data")
>
> hi Olga and Karen,
>
> I updated the results() function in DESeq2 version 1.3 to simplify extracting the kind of contrast we were discussing, combining main effects and interactions. While I emailed the following code to build a numeric contrast,
>
>> ctrst <- numeric(21)
>> rn <- resultsNames(dds)
>> ctrst[ rn == "metasmet_no" ] <- -1
>> ctrst[ rn == "metasmet_yes" ] <- 1
>> ctrst[ rn == "timetime_2.metasmet_no" ] <- -1 ctrst[ rn ==
>> "timetime_2.metasmet_yes" ] <- 1 results(dds, contrast=ctrst)
>
> this can now be accomplished by giving a list to the 'contrast'
> argument. The first element of the list should be the effects for the numerator of the log fold change, and the second element should be the effects for the denominator:
>
> results( dds, contrast = list(
>     c("metasmet_yes", "timetime_2.metasmet_yes"),
>     c("metasmet_no", "timetime_2.metasmet_no") ) )
>
> There are more details and examples in the man page for ?results for the development branch.
>
> Mike
>
>
> On Mon, Mar 17, 2014 at 9:33 AM, Michael Love <michaelisaiahlove at gmail.com> wrote:
>>
>> hi Olga and Karen,
>>
>> Thanks for your interest in DESeq2 and for using the development
>> branch, which allows us to clarify changes before they enter the
>> release branch.
>>
>> This concerns a modeling change from v1.2 to v1.3. Changes like these
>> are documented in the NEWS file with pointers to the
>> functions/arguments that changed:
>> (e.g. http://www.bioconductor.org/packages/2.14/bioc/news/DESeq2/NEWS)
>> I've been working on including more documentation in the package, but
>> haven't gotten to adding the parts about interaction terms yet, so
>> this gives me a chance to explain a bit.
>>
>> The new resultsNames(dds) are due to "expanded model matrices", which
>> include a column in the design matrix for each level of a factor, as
>> opposed to "standard model matrices" produced by model.matrix, which
>> have a column in the deisgn matrix for the levels other than the base
>> level. Expanded model matrices are described a bit in the man pages
>> and in the vignette, but not yet using an example with interaction
>> effects. This makes the DE analysis independent of the choice to base
>> level, whereas previously the log fold change (LFC) shrinkage was not
>> independent of base level choices. In addition, it simplifies certain
>> comparisons.
>>
>> You now have the columns "metasmet_no" and "metasmet_yes", whereas
>> before you had "metas_met_yes_vs_met_no".  Previously you would use
>> the 'name' argument to extract this comparison, whereas with version
>> >= 1.3 we want to encourage users to use the 'contrast' argument:
>>
>> results(dds, contrast=c("metas","met_yes","met_no"))
>>
>> If the analysis includes interaction terms, previously the above
>> contrast would give the metastasis effect only for the base level of
>> the other factors in the design, but for version >= 1.3, this gives
>> the overall metastasis effect over all levels of the time variable.
>>
>> If you want the individual time-period-specific effects, this is the
>> overall effect plus the interaction effects for that time period.You
>> have the following columns of the model matrix:
>>
>> > [1] "Intercept"               "timetime_1"              "timetime_2"              "timetime_3"              "timetime_4"              "timetime_5"
>> >  [7] "timetime_6"              "metasmet_no"             "metasmet_yes"            "timetime_1.metasmet_no"  "timetime_2.metasmet_no"  "timetime_3.metasmet_no"
>> > [13] "timetime_4.metasmet_no"  "timetime_5.metasmet_no"  "timetime_6.metasmet_no"  "timetime_1.metasmet_yes" "timetime_2.metasmet_yes" "timetime_3.metasmet_yes"
>> > [19] "timetime_4.metasmet_yes" "timetime_5.metasmet_yes" "timetime_6.metasmet_yes"
>>
>> So the metastasis effect specific to time period 2 would be, the
>> contrast of "metasmet_yes" vs "metasmet_no" and additionally, the
>> contrast of "timetime_2.metasmet_yes" vs "timetime_2.metasmet_no". We
>> can pull these out using the numeric contrast vector, starting with
>> all 0's and then filing in the -1's and 1's:
>>
>> ctrst <- numeric(21)
>> rn <- resultsNames(dds)
>> ctrst[ rn == "metasmet_no" ] <- -1
>> ctrst[ rn == "metasmet_yes" ] <- 1
>> ctrst[ rn == "timetime_2.metasmet_no" ] <- -1 ctrst[ rn ==
>> "timetime_2.metasmet_yes" ] <- 1 results(dds, contrast=ctrst)
>>
>>
>> Mike
>>
>>
>>
>> On Mon, Mar 17, 2014 at 5:07 AM, solgakar at bi.technion.ac.il
>> <solgakar at bi.technion.ac.il> wrote:
>> >
>> >
>> >
>> > From: Karen Chait [mailto:kchait at tx.technion.ac.il]
>> > Sent: Monday, March 17, 2014 10:57 AM
>> > To: Olga Karinsky
>> > Subject: RE: using DESeq2 with multi factor data
>> >
>> > Hello all,
>> > I am trying to use the DESeq2 package to perform RNA-Seq analysis on a data containing several factors.
>> > I have been closely following the emails between Ming Yi and Michael Love, because I think that my problem is very similar to what they have discussed.  But even though I received a lot of useful information from their discussion, I still have several questions regarding my specific data.
>> > Just as an overall information regarding my data, I have 96 samples and the two factors I am interested in exploring are "time" and "metastasis".
>> > In order to build my data set I used the following commands:
>> >
>> > >  countData = read.table("merged_counts.txt", header=TRUE,
>> > > row.names=1)
>> >
>> > >
>> > > metasVector=c("met_no","met_no","met_no","met_no","met_no","met_no
>> > > ","met_no","met_no","met_no","met_no","met_no","met_no","met_no","
>> > > met_no","met_no","met_no","met_no","met_no","met_no","met_no","met
>> > > _no","met_no","met_no","met_no","met_no","met_no","met_no","met_no
>> > > ","met_no","met_no","met_no","met_no","met_no","met_no","met_no","
>> > > met_no","met_no","met_no","met_no","met_no","met_no","met_no","met
>> > > _no","met_no","met_no","met_no","met_no","met_no","met_no","met_no
>> > > ","met_no","met_no","met_no","met_no","met_no","met_no","met_no","
>> > > met_no","met_no","met_no","met_no","met_no","met_no","met_no","met
>> > > _no","met_no","met_yes","met_yes","met_yes","met_yes","met_yes","m
>> > > et_yes","met_yes","met_yes","met_yes","met_yes","met_yes","met_yes
>> > > ","met_yes","met_yes","met_yes","met_yes","met_yes","met_yes","met
>> > > _yes","met_yes","met_yes","met_yes","met_yes","met_yes","met_yes",
>> > > "met_yes","met_yes","met_yes","met_yes","met_yes"(
>> >
>> > >
>> > > timePointsVector=c("6","4","6","6","3","6","3","5","6","6","1","5"
>> > > ,"3","4","3","6","6","6","2","6","1","2","4","6","5","5","5","3","
>> > > 6","5","6","2","6","6","1","5","5","6","6","6","6","6","6","4","2"
>> > > ,"6","3","1","2","5","6","1","1","3","6","3","6","4","4","5","6","
>> > > 6","3","5","4","6","1","4","3","1","1","1","4","2","1","1","3","6"
>> > > ,"1","1","2","1","6","3","3","2","5","3","2","3","1","4","1","1","
>> > > 6","1")
>> >
>> > >
>> > > colData=data.frame(row.names=colnames(countData),metas=metasVector
>> > > ,gender=gendarVector)
>> >
>> > >  colData$metas=factor(colData$metas, levels=c("met_no","met_yes"))
>> >
>> > >  colData$time = factor(colData$time, levels = c("1", "2", "3",
>> > > "4", "5", "6"))
>> >
>> > >  dds=DESeqDataSetFromMatrix(countData=tmpcountData,
>> > > colData=colData, design=~time + metas + metas:time)
>> >
>> > >  dds=DESeq(dds)
>> > I have several questions:
>> >
>> > -          first of all I have tried running those commands on DESeq2 version 1.2.10 (R version 3.0.2) and DESeq2 version 1.3.47 (R version 3.0.2) and what I have received from the resultsNames() function I both cases is very different. Using the 1.2.10 version I have received:
>> >
>> > > resultsNames(dds)
>> >
>> > [1] "Intercept"               "time_2_vs_1"             "time_3_vs_1"             "time_4_vs_1"             "time_5_vs_1"             "time_6_vs_1"             "metas_met_yes_vs_met_no" "time2.metasmet_yes"
>> >
>> >  [9] "time3.metasmet_yes"      "time4.metasmet_yes"      "time5.metasmet_yes"      "time6.metasmet_yes"
>> >
>> > > sessionInfo()
>> >
>> > R version 3.0.2 (2013-09-25)
>> >
>> > Platform: x86_64-w64-mingw32/x64 (64-bit)
>> >
>> >
>> >
>> > locale:
>> >
>> > [1] LC_COLLATE=Hebrew_Israel.1255  LC_CTYPE=Hebrew_Israel.1255    LC_MONETARY=Hebrew_Israel.1255 LC_NUMERIC=C                   LC_TIME=Hebrew_Israel.1255
>> >
>> >
>> >
>> > attached base packages:
>> >
>> > [1] parallel  stats     graphics  grDevices utils     datasets  methods   base
>> >
>> >
>> >
>> > other attached packages:
>> >
>> > [1] DESeq2_1.2.10             RcppArmadillo_0.4.100.2.1 Rcpp_0.11.0               GenomicRanges_1.14.4      XVector_0.2.0             IRanges_1.20.7            BiocGenerics_0.8.0
>> >
>> >
>> >
>> > loaded via a namespace (and not attached):
>> >
>> > [1] annotate_1.40.1      AnnotationDbi_1.24.0 Biobase_2.22.0       DBI_0.2-7            genefilter_1.44.0    grid_3.0.2           lattice_0.20-27      locfit_1.5-9.1       RColorBrewer_1.0-5   RSQLite_0.11.4
>> >
>> > [11] splines_3.0.2        stats4_3.0.2         survival_2.37-7      tools_3.0.2          XML_3.98-1.1         xtable_1.7-3
>> >
>> >
>> >
>> > Using the 1.3.47 version I have received:
>> >
>> > > resultsNames(dds)
>> >
>> > [1] "Intercept"               "timetime_1"              "timetime_2"              "timetime_3"              "timetime_4"              "timetime_5"
>> >
>> >  [7] "timetime_6"              "metasmet_no"             "metasmet_yes"            "timetime_1.metasmet_no"  "timetime_2.metasmet_no"  "timetime_3.metasmet_no"
>> >
>> > [13] "timetime_4.metasmet_no"  "timetime_5.metasmet_no"  "timetime_6.metasmet_no"  "timetime_1.metasmet_yes" "timetime_2.metasmet_yes" "timetime_3.metasmet_yes"
>> >
>> > [19] "timetime_4.metasmet_yes" "timetime_5.metasmet_yes" "timetime_6.metasmet_yes"
>> >
>> > > sessionInfo()
>> >
>> > R version 3.0.2 (2013-09-25)
>> >
>> > Platform: x86_64-w64-mingw32/x64 (64-bit)
>> >
>> >
>> >
>> > locale:
>> >
>> > [1] LC_COLLATE=Hebrew_Israel.1255  LC_CTYPE=Hebrew_Israel.1255    LC_MONETARY=Hebrew_Israel.1255 LC_NUMERIC=C                   LC_TIME=Hebrew_Israel.1255
>> >
>> >
>> >
>> > attached base packages:
>> >
>> > [1] parallel  stats     graphics  grDevices utils     datasets  methods   base
>> >
>> >
>> >
>> > other attached packages:
>> >
>> > [1] DESeq2_1.3.47           RcppArmadillo_0.4.100.0 Rcpp_0.11.0             GenomicRanges_1.14.4    XVector_0.2.0           IRanges_1.20.7
>> >
>> > [7] BiocGenerics_0.8.0
>> >
>> >
>> >
>> > loaded via a namespace (and not attached):
>> >
>> > [1] annotate_1.40.1      AnnotationDbi_1.24.0 Biobase_2.22.0       DBI_0.2-7            genefilter_1.44.0    geneplotter_1.40.0   grid_3.0.2
>> >
>> >  [8] lattice_0.20-27      locfit_1.5-9.1       RColorBrewer_1.0-5   RSQLite_0.11.4       splines_3.0.2        stats4_3.0.2         survival_2.37-7
>> >
>> > [15] XML_3.98-1.1         xtable_1.7-3
>> >
>> > (I have ran the 1.3.47 version the same way besides a difference in the names of the time levels, but I do not believe that this is the reason for the differences)
>> >                 I don't fully understand the results I receive using the 1.3.47 version and even more the difference between the versions.
>> >
>> > -          From my understanding, the results I received using the 1.2.10 version are the more reasonable and they fit my settings of base levels in the data. Now after receiving these results I would love to understand how do I receive different contrast testing? For each time period metas_yes vs. metas_no (for example timetime_2.metasmet_yes vs. timetime_2.metasmet_no)
>> >
>> >
>> >
>> > Thank you in advance,
>> >
>> > Olga and Karen
>> >
>> >
>> >
>> >         [[alternative HTML version deleted]]
>> >
>> > _______________________________________________
>> > Bioconductor mailing list
>> > Bioconductor at r-project.org
>> > https://stat.ethz.ch/mailman/listinfo/bioconductor
>> > Search the archives:
>> > http://news.gmane.org/gmane.science.biology.informatics.conductor
>



More information about the Bioconductor mailing list