[BioC] duplicated on IRanges object

Hervé Pagès hpages at fhcrc.org
Tue Oct 26 10:13:23 CEST 2010


Hi Manuela,

Thanks for the report! The duplicated method for Ranges objects has
been reimplemented in IRanges 1.8.1. The new implementation doesn't
use the trick that consists in converting the ranges into numerical
values anymore (there doesn't seem to be an easy/portable way to
work around the rounding issues).

This new version of IRanges should become available thru biocLite()
in the next 12 hours.

Cheers,
H.


On 10/22/2010 07:44 AM, Manuela Hummel wrote:
> Hi,
>
> there seems to be a numerical issue when applying 'duplicated' on an IRanges object.
> When there are two ranges that are almost the same, and within the IRanges object there are some other ranges with huge width, 'duplicated' identifies the two "almost the same" ranges as "the same".
>
> If we take for example those two ranges:
>
>> ir<- IRanges(start=rep(1000000000, 2), width=200:201)
>> ir
> IRanges of length 2
>           start        end width
> [1] 1000000000 1000000199   200
> [2] 1000000000 1000000200   201
>
>
> They are obviously not the same:
>
>> duplicated(ir)
> [1] FALSE FALSE
>
>
> But when we now add another range with huge width:
>
>> ir2
> IRanges of length 3
>           start        end    width
> [1] 1000000000 1000000199      200
> [2] 1000000000 1000000200      201
> [3]    5000000  100000000 95000001
>
>
> ... the second range is detected as duplicate of the first:
>
>> duplicated(ir2)
> [1] FALSE  TRUE FALSE
>
>
> I guess the problem is that in .toNumericWithCompatibleOrder the variable max_width gets so large, such that
> start(x) + width(x)/(max_width+1.00)
> gets numerically identical for ranges like the first two in the example.
>
> Best regards
> Manuela
>
> Ps: By the way, thanks for the great IRanges package! It makes working with sequence data so much easier.
>
>
>> sessionInfo()
> R version 2.12.0 (2010-10-15)
> Platform: x86_64-pc-mingw32/x64 (64-bit)
>
> locale:
> [1] LC_COLLATE=Spanish_Spain.1252  LC_CTYPE=Spanish_Spain.1252
> [3] LC_MONETARY=Spanish_Spain.1252 LC_NUMERIC=C
> [5] LC_TIME=Spanish_Spain.1252
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods
> [7] base
>
> other attached packages:
> [1] IRanges_1.8.0
>
>
>
> Manuela Hummel
> Core Facilities - Microarrays Unit
> Center for Genomic Regulation (CRG)
> Dr. Aiguader 88, 4th flour, Office 439.01
> 08003 Barcelona
> Phone: +34 93 316 0373
> e-mail: manuela.hummel at crg.es
>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor


-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioconductor mailing list