[BioC] transcriptsBy via TxDb.Hsapiens.UCSC.hg19.knownGene painfully slow
Martin Morgan
mtmorgan at fhcrc.org
Tue Jan 1 23:11:17 CET 2013
On 01/01/2013 02:05 PM, Martin Morgan wrote:
> On 01/01/2013 01:32 PM, Murat Tasan wrote:
>> hi all - does anyone have any performance tips for using
>> transcriptsBy(TXDB, by = "gene") with the UCSC transcript database?
>> in particular, is the SQLite backing database file indexed (along columns
>> holding the internal IDs)?
>> i'd provide some timing results for the command execution, but i ran out of
>> patience after about 10 minutes with no results...
>
> it is 'slow' but only in the couple of seconds definition of slow. Something
> else is going on so a reproducible example, including sessionInfo(), would be
> helfpul.
Just to follow my own advice...
library(TxDb.Hsapiens.UCSC.hg19.knownGene)
system.time(res <- transcriptsBy(TxDb.Hsapiens.UCSC.hg19.knownGene, by="gene"))
length(res)
sessionInfo()
gives me
> library(TxDb.Hsapiens.UCSC.hg19.knownGene)
> system.time(res <- transcriptsBy(TxDb.Hsapiens.UCSC.hg19.knownGene, by="gene"))
user system elapsed
3.020 0.012 3.042
> length(res)
[1] 22932
> sessionInfo()
R version 2.15.2 Patched (2012-12-23 r61401)
Platform: x86_64-unknown-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=C LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] TxDb.Hsapiens.UCSC.hg19.knownGene_2.8.0
[2] GenomicFeatures_1.10.1
[3] AnnotationDbi_1.20.3
[4] Biobase_2.18.0
[5] GenomicRanges_1.10.5
[6] IRanges_1.16.4
[7] BiocGenerics_0.4.0
loaded via a namespace (and not attached):
[1] biomaRt_2.14.0 Biostrings_2.26.2 bitops_1.0-5 BSgenome_1.26.1
[5] DBI_0.2-5 parallel_2.15.2 RCurl_1.95-3 Rsamtools_1.10.2
[9] RSQLite_0.11.2 rtracklayer_1.18.1 stats4_2.15.2 tools_2.15.2
[13] XML_3.95-0.1 zlibbioc_1.4.0
>
>
>>
>> cheers,
>>
>> -m
>>
>> [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
>
--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
Location: Arnold Building M1 B861
Phone: (206) 667-2793
More information about the Bioconductor
mailing list