[BioC] overlap regions between two GRanges (or GRangesList)
Hervé Pagès
hpages at fhcrc.org
Sat Mar 15 00:48:22 CET 2014
Please use the "Reply All" button so this discussion remains on the
mailing list.
On 03/14/2014 04:04 PM, Zhao, Shanrong [JRDUS] wrote:
> The range in chr1 should be kept.
>> intersect(g,g2)
> GRanges with 2 ranges and 0 metadata columns:
> seqnames ranges strand
> <Rle> <IRanges> <Rle>
> [1] chr1 [1, 8] -
> [2] chr2 [2, 20] +
> ---
> seqlengths:
> chr1 chr2 chr3
> NA NA NA
>
> My real question boils down to "chr2 [2,20]". Instead of reducing them, I would rather output "chr2 [2,15]" and "chr2 [3,20]".
The solution I sent you earlier (findOverlaps + pintersect) keeps
the range on chr1 and outputs "chr2 [2,15]" and "chr2 [3,20]".
>
> You bring up a very good point: the original range might overlap with more than 1 range. This question seems more complicated what I thought. But for my practical situation, I cannot find a better solution (better than intersect function can offer)
Hard to provide more suggestion from here without knowing
why the findOverlaps + pintersect solution doesn't work
for you.
I'll respond to your other off-list question about counting the
total number of Cs in your promoters in a separate email.
Cheers,
H.
>
> Thanks,
> Shanrong
>
> -----Original Message-----
> From: Hervé Pagès [mailto:hpages at fhcrc.org]
> Sent: Friday, March 14, 2014 3:49 PM
> To: Zhao, Shanrong [JRDUS]
> Cc: bioconductor at r-project.org
> Subject: Re: overlap regions between two GRanges (or GRangesList)
>
> On 03/14/2014 03:31 PM, Zhao, Shanrong [JRDUS] wrote:
>> Thank you. subsetByOverlaps is not what I want. I just want to keep
>> the ranges in 'g' that overlaps with g2, but only keep *"overlapping"
>> regions* (instead of the original ranges in 'g1').
>
> Thanks for clarifying. But that still doesn't explain why you want to *completely* get rid of the 1st range in 'g' (the range on chr1).
>
> Note that in the general case, when you replace the original range with the overlapping region, you might need more than 1 range for that.
> That's because the original range in 'g' can overlap with more than 1 range in 'g2'. Assuming any given range in 'g' overlaps with at most
> 1 range in 'g2':
>
> > ov <- findOverlaps(g, g2)
> > pintersect(g[queryHits(ov)], g2[subjectHits(ov)])
> GRanges with 3 ranges and 0 metadata columns:
> seqnames ranges strand
> <Rle> <IRanges> <Rle>
> [1] chr1 [1, 8] -
> [2] chr2 [2, 15] +
> [3] chr2 [3, 20] +
> ---
> seqlengths:
> chr1 chr2 chr3
> NA NA NA
>
> Now you would need to explain why you don't want to see the range on chr1 in that result...
>
> Thanks,
> H.
>
>>
>> Thanks again,
>>
>> Shanrong
>>
>> -----Original Message-----
>> From: Hervé Pagès [mailto:hpages at fhcrc.org]
>> Sent: Friday, March 14, 2014 3:19 PM
>> To: Zhao, Shanrong [JRDUS]
>> Cc: bioconductor at r-project.org
>> Subject: Re: overlap regions between two GRanges (or GRangesList)
>>
>> Hi Zhao,
>>
>> I'm going to try to rephrase what you want to do: seems like you want
>> to keep the ranges in 'g' that have an overlap with any of the ranges in 'g2':
>>
>> > g
>>
>> GRanges with 4 ranges and 0 metadata columns:
>>
>> seqnames ranges strand
>>
>> <Rle> <IRanges> <Rle>
>>
>> [1] chr1 [ 1, 12] -
>>
>> [2] chr2 [ 2, 15] +
>>
>> [3] chr2 [ 3, 20] +
>>
>> [4] chr3 [10, 14] -
>>
>> ---
>>
>> seqlengths:
>>
>> chr1 chr2 chr3
>>
>> NA NA NA
>>
>> > g2
>>
>> GRanges with 2 ranges and 0 metadata columns:
>>
>> seqnames ranges strand
>>
>> <Rle> <IRanges> <Rle>
>>
>> [1] chr1 [1, 8] -
>>
>> [2] chr2 [2, 30] +
>>
>> ---
>>
>> seqlengths:
>>
>> chr1 chr2
>>
>> NA NA
>>
>> > subsetByOverlaps(g, g2)
>>
>> GRanges with 3 ranges and 0 metadata columns:
>>
>> seqnames ranges strand
>>
>> <Rle> <IRanges> <Rle>
>>
>> [1] chr1 [1, 12] -
>>
>> [2] chr2 [2, 15] +
>>
>> [3] chr2 [3, 20] +
>>
>> ---
>>
>> seqlengths:
>>
>> chr1 chr2 chr3
>>
>> NA NA NA
>>
>> However, in the desired output you show below, you omitted the 1st
>> range (the range on chr1). So it's not clear to me what you are really after.
>> Did you omit it because it's not *within* the overlapping range in 'g2'?
>> If that's the case, then:
>>
>> > subsetByOverlaps(g, g2, type="within")
>>
>> GRanges with 2 ranges and 0 metadata columns:
>>
>> seqnames ranges strand
>>
>> <Rle> <IRanges> <Rle>
>>
>> [1] chr2 [2, 15] +
>>
>> [2] chr2 [3, 20] +
>>
>> ---
>>
>> seqlengths:
>>
>> chr1 chr2 chr3
>>
>> NA NA NA
>>
>> HTH,
>>
>> H.
>>
>> On 03/13/2014 10:12 PM, Zhao, Shanrong [JRDUS] wrote:
>>
>> > Seems intersect can do what I want, but not quietly exact.
>>
>> >
>>
>> > I would like to output two records (for b,c) below instead of unite
>>
>> > them into a single record [2, 20] when call *intersect(g,g2)**.*
>>
>> >
>>
>> > [2, 15]
>>
>> >
>>
>> > [3, 20]
>>
>> >
>>
>> > Thanks,
>>
>> >
>>
>> > Shanrong
>>
>> >
>>
>> > P.S
>>
>> >
>>
>> >> g
>>
>> >
>>
>> > GRanges with 4 ranges and 2 metadata columns:
>>
>> >
>>
>> > seqnames ranges strand | score GC
>>
>> >
>>
>> > <Rle> <IRanges> <Rle> | <integer> <numeric>
>>
>> >
>>
>> > a chr1 [ 1, 12] - | 1 1
>>
>> >
>>
>> > b chr2 [ 2, 15] + | 2 0.888888888888889
>>
>> >
>>
>> > c chr2 [ 3, 20] + | 3 0.777777777777778
>>
>> >
>>
>> > j chr3 [10, 14] - | 10 0
>>
>> >
>>
>> > ---
>>
>> >
>>
>> > seqlengths:
>>
>> >
>>
>> > chr1 chr2 chr3
>>
>> >
>>
>> > NA NA NA
>>
>> >
>>
>> >> g2
>>
>> >
>>
>> > GRanges with 2 ranges and 2 metadata columns:
>>
>> >
>>
>> > seqnames ranges strand | score GC
>>
>> >
>>
>> > <Rle> <IRanges> <Rle> | <integer> <numeric>
>>
>> >
>>
>> > a chr1 [1, 8] - | 1 1
>>
>> >
>>
>> > * b chr2 [2, 30] + | 2 0.888888888888889*
>>
>> >
>>
>> > ---
>>
>> >
>>
>> > seqlengths:
>>
>> >
>>
>> > chr1 chr2 chr3
>>
>> >
>>
>> > NA NA NA
>>
>> >
>>
>> >> intersect(g,g2)
>>
>> >
>>
>> > GRanges with 2 ranges and 0 metadata columns:
>>
>> >
>>
>> > seqnames ranges strand
>>
>> >
>>
>> > <Rle> <IRanges> <Rle>
>>
>> >
>>
>> > [1] chr1 [1, 8] -
>>
>> >
>>
>> > [2] chr2 [*2, 20*] +
>>
>> >
>>
>> > ---
>>
>> >
>>
>> > seqlengths:
>>
>> >
>>
>> > chr1 chr2 chr3
>>
>> >
>>
>> > NA NA NA
>>
>> >
>>
>> > *From:*Zhao, Shanrong [JRDUS]
>>
>> > *Sent:* Thursday, March 13, 2014 9:40 PM
>>
>> > *To:* 'hpages at fhcrc.org'
>>
>> > *Subject:* overlap regions between two GRanges (or GRangesList)
>>
>> >
>>
>> > Dear Dr. Pages,
>>
>> >
>>
>> > I am exploring Bioconductor packages to analyze DNA methylation data.
>>
>> > One question I don't know how to solve it. I have two GRanges (OR
>>
>> > GRangesList), Now I want to identify the common (*overlapping)
>>
>> > regions*, not just overlapping or not--- *gc <-
>> overlapRegions(ga,gb)*
>>
>> >
>>
>> > The other question I have: I am interested in all *cytosines* in
>>
>> > promoters regions, I have already had promoter in GRange object.
>> What is
>>
>> > the most efficient way to identify the total number of Cs? I plan to
>>
>> > extract all DNA sequences corresponding to promoters, and then call
>>
>> > letterFrequency (by set letters="C").
>>
>> >
>>
>> > Best regards,
>>
>> >
>>
>> > Shanrong
>>
>> >
>>
>> --
>>
>> Hervé Pagès
>>
>> Program in Computational Biology
>>
>> Division of Public Health Sciences
>>
>> Fred Hutchinson Cancer Research Center
>>
>> 1100 Fairview Ave. N, M1-B514
>>
>> P.O. Box 19024
>>
>> Seattle, WA 98109-1024
>>
>> E-mail: hpages at fhcrc.org <mailto:hpages at fhcrc.org>
>>
>> Phone: (206) 667-5791
>>
>> Fax: (206) 667-1319
>>
>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpages at fhcrc.org
> Phone: (206) 667-5791
> Fax: (206) 667-1319
>
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fhcrc.org
Phone: (206) 667-5791
Fax: (206) 667-1319
More information about the Bioconductor
mailing list