[BioC] findOverlaps method in GenomicRanges not supporting type="equal" for GRangesList, GRangesList?
Hervé Pagès
hpages at fhcrc.org
Thu Nov 21 21:05:20 CET 2013
On 11/21/2013 12:02 PM, Hervé Pagès wrote:
> Hi Michael, Nico,
>
> Right now match/== methods for List objects behave inconsistently.
> For example, even for conceptually close objects like IntegerList
> and XIntegerViews, we have:
>
> x <- IntegerList(a=1:5, b=2:-3, c=1:3)
> v <- successiveViews(unlist(x), elementLengths(x))
>
> > x == rev(x)
> LogicalList of length 3
> [["a"]] TRUE TRUE TRUE FALSE FALSE
> [["b"]] TRUE TRUE TRUE TRUE TRUE TRUE
> [["c"]] TRUE TRUE TRUE FALSE FALSE
>
> > v == rev(v)
> [1] FALSE TRUE FALSE
>
> > match(x, rev(x))
> IntegerList of length 3
> [["a"]] 1 2 3 <NA> <NA>
> [["b"]] 1 2 3 4 5 6
> [["c"]] 1 2 3
>
> > match(v, rev(v))
> Error in base::match(x, table, nomatch = nomatch, incomparables =
> incomparables, :
> 'match' requires vector arguments
>
> This is not a good situation and there is still some work that needs to
> be done at some point in the future to clean-up the match/== methods in
> IRanges/GenomicRanges. In the mean time I think we should hold on
> adding new methods for List objects until there is a clear consensus on
> how they should behave.
>
> As for Nico's request, I agree that the best way to go would be to just
> make findOverlaps(type="equal") work. There are some subtle semantic
> differences between a *match* (as reported by match or ==), and equality
> from a range overlap point of view. The former can report equality
> for ranges on a circular sequence that are not considered equal for
> the latter.
It's the other way around sorry:
The *latter* can report equality for ranges on a circular sequence
that are not considered equal for the *former*.
Cheers,
H.
> Another difference is how zero-width ranges are handled.
>
> Thanks,
> H.
>
>
> On 11/21/2013 10:43 AM, Michael Lawrence wrote:
>> So I've checked into devel a match,GRangesList,GRangesList. This allows
>> findMatches() to return what you want. There is a question though before
>> this is approved: does it make sense for match() to act like findOverlaps
>> and consider each GRanges atomically (one returned index per GRanges) or
>> should match behave as it does other Lists and return an IntegerList,
>> with
>> a value per range, grouped by the top-level elements. If we decide on the
>> latter, then the method I wrote needs to be removed and the
>> implementation
>> moved to the "equals" mode in findOverlaps. Either way,
>> findOverlaps(type="equals") should be made to work.
>>
>> Michael
>>
>>
>> On Thu, Nov 21, 2013 at 8:13 AM, Nicolas Delhomme
>> <nicolas.delhomme at umu.se>wrote:
>>
>>> Thanks!
>>> ---------------------------------------------------------------
>>> Nicolas Delhomme
>>>
>>> Nathaniel Street Lab
>>> Department of Plant Physiology
>>> Umeå Plant Science Center
>>>
>>> Tel: +46 90 786 7989
>>> Email: nicolas.delhomme at plantphys.umu.se
>>> SLU - Umeå universitet
>>> Umeå S-901 87 Sweden
>>> ---------------------------------------------------------------
>>>
>>> On 21 Nov 2013, at 17:06, Michael Lawrence <lawrence.michael at gene.com>
>>> wrote:
>>>
>>>> I will work on this today.
>>>>
>>>> Michael
>>>>
>>>>
>>>> On Thu, Nov 21, 2013 at 4:43 AM, Nicolas Delhomme <
>>> nicolas.delhomme at umu.se> wrote:
>>>> Hej Bioc!
>>>>
>>>> When I try to find “equal” ranges from two GRangesList object, I get
>>>> the
>>> following error:
>>>>
>>>>> findOverlaps(query=grng.def,subject=grng.mod,type="equal")
>>>> Error in match.arg(type) :
>>>> 'arg' should be one of “any”, “start”, “end”, “within”
>>>>
>>>> Isn’t type=“equal” supported for the GRangesList, GRangesList
>>>> signature?
>>>>
>>>> Cheers,
>>>>
>>>> Nico
>>>>
>>>> sessionInfo()
>>>> R version 3.0.2 (2013-09-25)
>>>> Platform: x86_64-apple-darwin13.0.0 (64-bit)
>>>>
>>>> locale:
>>>> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>>>>
>>>> attached base packages:
>>>> [1] parallel stats graphics grDevices utils datasets methods
>>> base
>>>>
>>>> other attached packages:
>>>> [1] easyRNASeq_1.8.2 ShortRead_1.20.0 Rsamtools_1.14.1
>>> GenomicRanges_1.14.3 DESeq_1.14.0 lattice_0.20-24
>>> locfit_1.5-9.1
>>>> [8] Biostrings_2.30.1 XVector_0.2.0 IRanges_1.20.5
>>> edgeR_3.4.0 limma_3.18.3 biomaRt_2.18.0
>>> Biobase_2.22.0
>>>> [15] genomeIntervals_1.18.0 BiocGenerics_0.8.0 intervals_0.14.0
>>>>
>>>> loaded via a namespace (and not attached):
>>>> [1] annotate_1.40.0 AnnotationDbi_1.24.0 bitops_1.0-6
>>> DBI_0.2-7 genefilter_1.44.0 geneplotter_1.40.0
>>> grid_3.0.2
>>> hwriter_1.3
>>>> [9] latticeExtra_0.6-26 LSD_2.5 RColorBrewer_1.0-5
>>> RCurl_1.95-4.1 RSQLite_0.11.4 splines_3.0.2
>>> stats4_3.0.2
>>> survival_2.37-4
>>>> [17] tools_3.0.2 XML_3.98-1.1 xtable_1.7-1
>>> zlibbioc_1.8.0
>>>>
>>>>
>>>> ---------------------------------------------------------------
>>>> Nicolas Delhomme
>>>>
>>>> Nathaniel Street Lab
>>>> Department of Plant Physiology
>>>> Umeå Plant Science Center
>>>>
>>>> Tel: +46 90 786 7989
>>>> Email: nicolas.delhomme at plantphys.umu.se
>>>> SLU - Umeå universitet
>>>> Umeå S-901 87 Sweden
>>>>
>>>> _______________________________________________
>>>> Bioconductor mailing list
>>>> Bioconductor at r-project.org
>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>
>>>
>>>
>>
>> [[alternative HTML version deleted]]
>>
>>
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fhcrc.org
Phone: (206) 667-5791
Fax: (206) 667-1319
More information about the Bioconductor
mailing list