[BioC] DEseq() takes extremely long time

Michael Love michaelisaiahlove at gmail.com
Tue Apr 15 01:47:56 CEST 2014


hi Xianjun,

This should be fixed in the newly released version 1.4.

We often recommend turning continuous variables into factors.
Especially something like age, where we don't typically hypothesize
that the gene expression will double every x years. Breaking it up
into 3-5 meaningful groups often makes more sense.  However, with
dosage or some other covariates, supposing a linear trend (in the log2
space) might make sense.

Mike




On Mon, Apr 14, 2014 at 6:48 PM, Dong, Xianjun
<XDONG at rics.bwh.harvard.edu> wrote:
> Hi Mike,
>
> I was applying DEseq() function to dds (as below) for all human genes, and the script is stalled at the step of “gene-wise dispersion estimates" for >1 hours. Do you have clue for the reason? Should I convert all covariances into factor?
>
> p.s. sessionInfo() attached.
>
> -Xianjun
>
>> dds <- DESeqDataSetFromHTSeqCount(sampleTable = sampleTable,
> +                                        directory = input_dir,
> +                                        design= ~ condition + batch + cellType + age + sex + RIN + PMI)
>> colData(dds)$condition <- factor(colData(dds)$condition, levels=c("HC”,"Treament1", "Treament2"))
>> dds <- DESeq(dds)
> estimating size factors
> estimating dispersions
> gene-wise dispersion estimates
>
> ================== sessionInfo() ================
>
>> sessionInfo()
> R version 3.0.1 (2013-05-16)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>  [7] LC_PAPER=C                 LC_NAME=C
>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] parallel  stats     graphics  grDevices utils     datasets  methods
> [8] base
>
> other attached packages:
> [1] DESeq2_1.2.10           RcppArmadillo_0.3.930.1 Rcpp_0.11.1
> [4] GenomicRanges_1.14.4    XVector_0.2.0           IRanges_1.20.7
> [7] BiocGenerics_0.8.0
>
> loaded via a namespace (and not attached):
>  [1] annotate_1.40.0      AnnotationDbi_1.24.0 Biobase_2.22.0
>  [4] DBI_0.2-7            genefilter_1.44.0    grid_3.0.1
>  [7] lattice_0.20-24      locfit_1.5-9.1       RColorBrewer_1.0-5
> [10] RSQLite_0.11.4       splines_3.0.1        stats4_3.0.1
> [13] survival_2.37-4      XML_3.98-1.1         xtable_1.7-1
>
>
>
> The information in this e-mail is intended only for th...{{dropped:13}}



More information about the Bioconductor mailing list