[BioC] overlap regions between two GRanges (or GRangesList)
Cook, Malcolm
MEC at stowers.org
Sun Mar 16 18:39:10 CET 2014
Zhao,
sounds like pintersect is your friend.
two steps:
1) use overlaps to find indices into g and g2
2) use pintersect to find the overlapping segments between g and g2
~ malcolm_cook at stowers.org
________________________________________
From: bioconductor-bounces at r-project.org [bioconductor-bounces at r-project.org] on behalf of Zhao, Shanrong [JRDUS] [SZhao7 at its.jnj.com]
Sent: Friday, March 14, 2014 5:31 PM
To: Hervé Pagès
Cc: bioconductor at r-project.org
Subject: Re: [BioC] overlap regions between two GRanges (or GRangesList)
Thank you. subsetByOverlaps is not what I want. I just want to keep the ranges in 'g' that overlaps with g2, but only keep "overlapping" regions (instead of the original ranges in 'g1').
Thanks again,
Shanrong
-----Original Message-----
From: Hervé Pagès [mailto:hpages at fhcrc.org]
Sent: Friday, March 14, 2014 3:19 PM
To: Zhao, Shanrong [JRDUS]
Cc: bioconductor at r-project.org
Subject: Re: overlap regions between two GRanges (or GRangesList)
Hi Zhao,
I'm going to try to rephrase what you want to do: seems like you want to keep the ranges in 'g' that have an overlap with any of the ranges in 'g2':
> g
GRanges with 4 ranges and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
[1] chr1 [ 1, 12] -
[2] chr2 [ 2, 15] +
[3] chr2 [ 3, 20] +
[4] chr3 [10, 14] -
---
seqlengths:
chr1 chr2 chr3
NA NA NA
> g2
GRanges with 2 ranges and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
[1] chr1 [1, 8] -
[2] chr2 [2, 30] +
---
seqlengths:
chr1 chr2
NA NA
> subsetByOverlaps(g, g2)
GRanges with 3 ranges and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
[1] chr1 [1, 12] -
[2] chr2 [2, 15] +
[3] chr2 [3, 20] +
---
seqlengths:
chr1 chr2 chr3
NA NA NA
However, in the desired output you show below, you omitted the 1st range (the range on chr1). So it's not clear to me what you are really after. Did you omit it because it's not *within* the overlapping range in 'g2'? If that's the case, then:
> subsetByOverlaps(g, g2, type="within")
GRanges with 2 ranges and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
[1] chr2 [2, 15] +
[2] chr2 [3, 20] +
---
seqlengths:
chr1 chr2 chr3
NA NA NA
HTH,
H.
On 03/13/2014 10:12 PM, Zhao, Shanrong [JRDUS] wrote:
> Seems intersect can do what I want, but not quietly exact.
>
> I would like to output two records (for b,c) below instead of unite
> them into a single record [2, 20] when call *intersect(g,g2)**.*
>
> [2, 15]
>
> [3, 20]
>
> Thanks,
>
> Shanrong
>
> P.S
>
>> g
>
> GRanges with 4 ranges and 2 metadata columns:
>
> seqnames ranges strand | score GC
>
> <Rle> <IRanges> <Rle> | <integer> <numeric>
>
> a chr1 [ 1, 12] - | 1 1
>
> b chr2 [ 2, 15] + | 2 0.888888888888889
>
> c chr2 [ 3, 20] + | 3 0.777777777777778
>
> j chr3 [10, 14] - | 10 0
>
> ---
>
> seqlengths:
>
> chr1 chr2 chr3
>
> NA NA NA
>
>> g2
>
> GRanges with 2 ranges and 2 metadata columns:
>
> seqnames ranges strand | score GC
>
> <Rle> <IRanges> <Rle> | <integer> <numeric>
>
> a chr1 [1, 8] - | 1 1
>
> * b chr2 [2, 30] + | 2 0.888888888888889*
>
> ---
>
> seqlengths:
>
> chr1 chr2 chr3
>
> NA NA NA
>
>> intersect(g,g2)
>
> GRanges with 2 ranges and 0 metadata columns:
>
> seqnames ranges strand
>
> <Rle> <IRanges> <Rle>
>
> [1] chr1 [1, 8] -
>
> [2] chr2 [*2, 20*] +
>
> ---
>
> seqlengths:
>
> chr1 chr2 chr3
>
> NA NA NA
>
> *From:*Zhao, Shanrong [JRDUS]
> *Sent:* Thursday, March 13, 2014 9:40 PM
> *To:* 'hpages at fhcrc.org'
> *Subject:* overlap regions between two GRanges (or GRangesList)
>
> Dear Dr. Pages,
>
> I am exploring Bioconductor packages to analyze DNA methylation data.
> One question I don't know how to solve it. I have two GRanges (OR
> GRangesList), Now I want to identify the common (*overlapping)
> regions*, not just overlapping or not--- *gc <- overlapRegions(ga,gb)*
>
> The other question I have: I am interested in all *cytosines* in
> promoters regions, I have already had promoter in GRange object. What is
> the most efficient way to identify the total number of Cs? I plan to
> extract all DNA sequences corresponding to promoters, and then call
> letterFrequency (by set letters="C").
>
> Best regards,
>
> Shanrong
>
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fhcrc.org<mailto:hpages at fhcrc.org>
Phone: (206) 667-5791
Fax: (206) 667-1319
[[alternative HTML version deleted]]
More information about the Bioconductor
mailing list