[BioC] [GenomicRanges] subsetByOverlaps to keep info from both GRanges objects?

Enrico Ferrero enricoferrero86 at gmail.com
Tue Aug 20 12:51:50 CEST 2013


Hi,

I have two GRanges objects, the first one is a list of SNPs, the
second one are DNase hypersensitivity sites:

##########
library(GenomicRanges)
...

> snp
GRanges with 192 ranges and 1 metadata column:
             seqnames                 ranges strand   |     score
                <Rle>              <IRanges>  <Rle>   | <integer>
    rs000001     chr1 [ 37967779,  37967780]      +   |         0
    rs000002     chr1 [165967416, 165967417]      -   |         0
    rs000003     chr1 [218860069, 218860070]      -   |         0
   rs000004     chr1 [ 17306673,  17306674]      -   |         0
   rs000005     chr1 [ 41293414,  41293415]      +   |         0
         ...      ...                    ...    ... ...       ...
   rs000188     chr8 [ 97522507,  97522508]      -   |         0
   rs000189     chr8 [ 15532582,  15532583]      +   |         0
   rs000190     chr8 [ 72270031,  72270032]      +   |         0
  rs000191     chr9 [126511086, 126511087]      +   |         0
  rs000192     chr9 [ 98231008,  98231009]      +   |         0
  ---
  seqlengths:
    chr1 chr10 chr11 chr12 chr13 chr14 chr15 chr16 chr17 ... chr21
chr22  chr3  chr4  chr5  chr6  chr7  chr8  chr9
      NA    NA    NA    NA    NA    NA    NA    NA    NA ...    NA
NA    NA    NA    NA    NA    NA    NA    NA

> dnase
GRanges with 145038 ranges and 1 metadata column:
           seqnames                 ranges strand   |     score
              <Rle>              <IRanges>  <Rle>   | <integer>
       [1]     chr1       [ 10120,  10270]      *   |         0
       [2]     chr1       [237700, 237850]      *   |         0
       [3]     chr1       [521440, 521590]      *   |         0
       [4]     chr1       [565560, 565710]      *   |         0
       [5]     chr1       [565860, 566010]      *   |         0
       ...      ...                    ...    ... ...       ...
  [145034]     chrX [154543640, 154543790]      *   |         0
  [145035]     chrX [154560420, 154560570]      *   |         0
  [145036]     chrX [154563960, 154564110]      *   |         0
  [145037]     chrX [154842100, 154842250]      *   |         0
  [145038]     chrX [154862200, 154862350]      *   |         0

---
seqlengths:
  chr1 chr10 chr11 chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19
chr2 chr20 chr21 chr22  chr3  chr4  chr5  chr6  chr7  chr8  chr9  chrX
 chrY
    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA
NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA
 NA
##########

I can use subsetByOverlaps() in both directions to compute the overlap
between them and return a GRanges object:

##########
> subsetByOverlaps(dnase, snp)
GRanges with 5 ranges and 1 metadata column:
      seqnames                 ranges strand |     score
         <Rle>              <IRanges>  <Rle> | <integer>
  [1]     chr1 [ 17306560,  17306710]      * |         0
  [2]     chr2 [169869820, 169869970]      * |         0
  [3]     chr4 [145506440, 145506590]      * |         0
  [4]     chr5 [ 15014080,  15014230]      * |         0
  [5]     chr5 [ 15117400,  15117550]      * |         0
  ---
  seqlengths:
    chr1 chr10 chr11 chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19
chr2 chr20 chr21 chr22  chr3  chr4  chr5  chr6  chr7  chr8  chr9  chrX
 chrY
      NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA
 NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA
  NA

> subsetByOverlaps(snp, dnase)
GRanges with 6 ranges and 1 metadata column:
             seqnames                 ranges strand |     score
                <Rle>              <IRanges>  <Rle> | <integer>
   rs2235746     chr1 [ 17306671,  17306672]      - |         0
   rs4157777     chr2 [169869904, 169869904]      - |         0
   rs6858330     chr4 [145506558, 145506559]      + |         0
  rs13146741     chr4 [145506453, 145506454]      + |         0
     rs32847     chr5 [ 15117438,  15117439]      + |         0
   rs7341842     chr5 [ 15014184,  15014185]      + |         0
  ---
  seqlengths:
    chr1 chr10 chr11 chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19
chr2 chr21 chr22  chr3  chr4  chr5  chr6  chr7  chr8  chr9
      NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA
 NA    NA    NA    NA    NA    NA    NA    NA    NA    NA
##########

The first GRanges objects stores the DNase genomic locations
overlapping with the SNPs, while the second one contains the SNPs IDs
(as GRanges names) and genomic locations overlapping with the DNase
dataset.

Now, what I actually need is a GRanges object that stores the SNPs IDs
and the DNase genomic locations. Is this possible?

Thank you.
Best,


-- 
Enrico Ferrero
PhD Student
Department of Genetics
Cambridge Systems Biology Centre
University of Cambridge



More information about the Bioconductor mailing list