[BioC] findOverlaps method in GenomicRanges not supporting type="equal" for GRangesList, GRangesList?

Hervé Pagès hpages at fhcrc.org
Thu Nov 21 21:02:38 CET 2013


Hi Michael, Nico,

Right now match/== methods for List objects behave inconsistently.
For example, even for conceptually close objects like IntegerList
and XIntegerViews, we have:

   x <- IntegerList(a=1:5, b=2:-3, c=1:3)
   v <- successiveViews(unlist(x), elementLengths(x))

   > x == rev(x)
   LogicalList of length 3
   [["a"]] TRUE TRUE TRUE FALSE FALSE
   [["b"]] TRUE TRUE TRUE TRUE TRUE TRUE
   [["c"]] TRUE TRUE TRUE FALSE FALSE

   > v == rev(v)
   [1] FALSE  TRUE FALSE

   > match(x, rev(x))
   IntegerList of length 3
   [["a"]] 1 2 3 <NA> <NA>
   [["b"]] 1 2 3 4 5 6
   [["c"]] 1 2 3

   > match(v, rev(v))
   Error in base::match(x, table, nomatch = nomatch, incomparables = 
incomparables,  :
     'match' requires vector arguments

This is not a good situation and there is still some work that needs to
be done at some point in the future to clean-up the match/== methods in
IRanges/GenomicRanges. In the mean time I think we should hold on
adding new methods for List objects until there is a clear consensus on
how they should behave.

As for Nico's request, I agree that the best way to go would be to just
make findOverlaps(type="equal") work. There are some subtle semantic
differences between a *match* (as reported by match or ==), and equality
from a range overlap point of view. The former can report equality
for ranges on a circular sequence that are not considered equal for
the latter. Another difference is how zero-width ranges are handled.

Thanks,
H.


On 11/21/2013 10:43 AM, Michael Lawrence wrote:
> So I've checked into devel a match,GRangesList,GRangesList. This allows
> findMatches() to return what you want. There is a question though before
> this is approved: does it make sense for match() to act like findOverlaps
> and consider each GRanges atomically (one returned index per GRanges) or
> should match behave as it does other Lists and return an IntegerList, with
> a value per range, grouped by the top-level elements. If we decide on the
> latter, then the method I wrote needs to be removed and the implementation
> moved to the "equals" mode in findOverlaps. Either way,
> findOverlaps(type="equals") should be made to work.
>
> Michael
>
>
> On Thu, Nov 21, 2013 at 8:13 AM, Nicolas Delhomme
> <nicolas.delhomme at umu.se>wrote:
>
>> Thanks!
>> ---------------------------------------------------------------
>> Nicolas Delhomme
>>
>> Nathaniel Street Lab
>> Department of Plant Physiology
>> Umeå Plant Science Center
>>
>> Tel: +46 90 786 7989
>> Email: nicolas.delhomme at plantphys.umu.se
>> SLU - Umeå universitet
>> Umeå S-901 87 Sweden
>> ---------------------------------------------------------------
>>
>> On 21 Nov 2013, at 17:06, Michael Lawrence <lawrence.michael at gene.com>
>> wrote:
>>
>>> I will work on this today.
>>>
>>> Michael
>>>
>>>
>>> On Thu, Nov 21, 2013 at 4:43 AM, Nicolas Delhomme <
>> nicolas.delhomme at umu.se> wrote:
>>> Hej Bioc!
>>>
>>> When I try to find “equal” ranges from two GRangesList object, I get the
>> following error:
>>>
>>>> findOverlaps(query=grng.def,subject=grng.mod,type="equal")
>>> Error in match.arg(type) :
>>>    'arg' should be one of “any”, “start”, “end”, “within”
>>>
>>> Isn’t type=“equal” supported for the GRangesList, GRangesList signature?
>>>
>>> Cheers,
>>>
>>> Nico
>>>
>>> sessionInfo()
>>> R version 3.0.2 (2013-09-25)
>>> Platform: x86_64-apple-darwin13.0.0 (64-bit)
>>>
>>> locale:
>>> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>>>
>>> attached base packages:
>>> [1] parallel  stats     graphics  grDevices utils     datasets  methods
>>    base
>>>
>>> other attached packages:
>>>   [1] easyRNASeq_1.8.2       ShortRead_1.20.0       Rsamtools_1.14.1
>>    GenomicRanges_1.14.3   DESeq_1.14.0           lattice_0.20-24
>>   locfit_1.5-9.1
>>>   [8] Biostrings_2.30.1      XVector_0.2.0          IRanges_1.20.5
>>    edgeR_3.4.0            limma_3.18.3           biomaRt_2.18.0
>> Biobase_2.22.0
>>> [15] genomeIntervals_1.18.0 BiocGenerics_0.8.0     intervals_0.14.0
>>>
>>> loaded via a namespace (and not attached):
>>>   [1] annotate_1.40.0      AnnotationDbi_1.24.0 bitops_1.0-6
>> DBI_0.2-7            genefilter_1.44.0    geneplotter_1.40.0   grid_3.0.2
>>          hwriter_1.3
>>>   [9] latticeExtra_0.6-26  LSD_2.5              RColorBrewer_1.0-5
>> RCurl_1.95-4.1       RSQLite_0.11.4       splines_3.0.2        stats4_3.0.2
>>          survival_2.37-4
>>> [17] tools_3.0.2          XML_3.98-1.1         xtable_1.7-1
>> zlibbioc_1.8.0
>>>
>>>
>>> ---------------------------------------------------------------
>>> Nicolas Delhomme
>>>
>>> Nathaniel Street Lab
>>> Department of Plant Physiology
>>> Umeå Plant Science Center
>>>
>>> Tel: +46 90 786 7989
>>> Email: nicolas.delhomme at plantphys.umu.se
>>> SLU - Umeå universitet
>>> Umeå S-901 87 Sweden
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>>
>>
>
> 	[[alternative HTML version deleted]]
>
>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioconductor mailing list