[BioC] Obtain overlap coordinates in GenomicRanges findSpliceOverlaps
Cook, Malcolm
MEC at stowers.org
Fri Mar 21 18:56:30 CET 2014
Michael,
+1 for pmap!
I like the separation of concerns this would offer.
I seems to me that the combination of pmap and findSpliceOverlaps should afford a more general solution to the problem solved by VariantAnnotation:: refLocsToLocalLocs (and should be equally performant?).
~Malcolm
>-----Original Message-----
>From: bioconductor-bounces at r-project.org [mailto:bioconductor-bounces at r-project.org] On Behalf Of Michael Lawrence
>Sent: Friday, March 21, 2014 12:17 PM
>To: rubi [guest]
>Cc: GenomicRanges Maintainer; bioconductor at r-project.org; nimrod.rubinstein at gmail.com
>Subject: Re: [BioC] Obtain overlap coordinates in GenomicRanges findSpliceOverlaps
>
>Currently there is
>
>m <- map(granges, grangeslist)
>
>Where 'm' is a RangesMapping indicating the within overlaps (Hits) and the
>mapped ranges. You would get the granges from the GAlignments with the
>granges() function. The problem is that the overlap computation uses
>findOverlaps(type="within") instead of findSpliceOverlaps. One idea would
>be to take a Hits object as an optional argument. Or, we could add a "pmap"
>method that would assume the from and to are matched up already and simply
>perform the mapping.
>
>One quick fix would be to create a granges that consists a width-1 range at
>the start position (and likewise the end position) for each read and pass
>it to map() as above. Then filter the mappings based on the compatibility
>results from findSpliceOverlaps(). Not that pretty nor very efficient but
>it takes care of the nasty stuff.
>
>Michael
>
>
>
>On Fri, Mar 21, 2014 at 9:44 AM, rubi [guest] <guest at bioconductor.org>wrote:
>
>>
>> Hi,
>>
>> I was wondering whether it is possible in anyway to obtain the overlap
>> coordinates when intersecting GAlignments objects as query with a
>> GRangesList object, using the findSpliceOverlaps function?
>>
>> Specifically, I would like to obtain the transcriptomic coordinates of the
>> GAlignments in the transcripts that they compatibly intersect with.
>>
>> Right now I'm obtaining this information in a 2 step approach:
>> 1. findSpliceOverlaps(GAlignments, GRangesList, ignore.strand=FALSE)
>> 2. Keeping only the hits that are compatible, I then intersect again each
>> GAlignment and the ranges of the compatible GRange transcript and sum the
>> widths of the exons up to the intersection coordinate.
>>
>> My problem is that the second step is extremely slow.
>>
>> I'd be grateful for some discussion
>>
>> -- output of sessionInfo():
>>
>> R version 3.0.2 (2013-09-25)
>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>
>> locale:
>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
>> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
>> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
>> [9] LC_ADDRESS=C LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] parallel stats graphics grDevices utils datasets methods
>> [8] base
>>
>> other attached packages:
>> [1] hash_2.2.6 data.table_1.8.10 Rsamtools_1.14.3
>> [4] Biostrings_2.30.1 GenomicRanges_1.14.4 XVector_0.2.0
>> [7] IRanges_1.20.6 BiocGenerics_0.8.0
>>
>> loaded via a namespace (and not attached):
>> [1] bitops_1.0-6 stats4_3.0.2 tools_3.0.2 zlibbioc_1.8.0
>>
>>
>> --
>> Sent via the guest posting facility at bioconductor.org.
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
> [[alternative HTML version deleted]]
>
>_______________________________________________
>Bioconductor mailing list
>Bioconductor at r-project.org
>https://stat.ethz.ch/mailman/listinfo/bioconductor
>Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor
mailing list