[BioC] Getting the length of every element from a large CompressedIRangesList is slow
Nicolas Delhomme
delhomme at embl.de
Mon Jul 2 19:18:56 CEST 2012
Hi,
Just to extend on my previous message:
Doing this instead is fast:
> system.time(sizes <- sapply(width(aln.ranges),length))
user system elapsed
1.109 0.144 1.254
Cheers,
Nico
---------------------------------------------------------------
Nicolas Delhomme
Genome Biology Computational Support
European Molecular Biology Laboratory
Tel: +49 6221 387 8310
Email: nicolas.delhomme at embl.de
Meyerhofstrasse 1 - Postfach 10.2209
69102 Heidelberg, Germany
---------------------------------------------------------------
On Jul 2, 2012, at 7:02 PM, Nicolas Delhomme wrote:
> Hej!
>
> I've a rather large CompressedIRangesList
>
>> print(object.size(aln.ranges),unit="Mb")
> 390.4 Mb
>
> that has 2518 elements, some of which having up to 6M ranges for a total of 51M, but the vast majority are small, the median is 2 while the mean is ~ 20,000 (the 3rd quartile has a value of 47).
>
> Retrieving the element length is slow:
>
>> system.time(sizes <- sapply(aln.ranges,length))
>
> user system elapsed
> 265.777 169.222 443.498
>
> by comparison to the performances of the IRanges package in general, which I was surprised of. Are there faster way to get this information than the sapply I'm using? Note that the machine I'm using is not a limiting factor in terms of CPU/RAM/load.
>
>> sessionInfo()
> R version 2.15.1 (2012-06-22)
> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
>
> locale:
> [1] C/UTF-8/C/C/C/C
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] IRanges_1.15.15 BiocGenerics_0.3.0
>
> loaded via a namespace (and not attached):
> [1] stats4_2.15.1
>
> Nico
>
> P.S. If you need, I can send my aln.ranges object off-list.
>
> ---------------------------------------------------------------
> Nicolas Delhomme
>
> Genome Biology Computational Support
>
> European Molecular Biology Laboratory
>
> Tel: +49 6221 387 8310
> Email: nicolas.delhomme at embl.de
> Meyerhofstrasse 1 - Postfach 10.2209
> 69102 Heidelberg, Germany
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor
mailing list