[BioC] bug in GenomicRanges gaps?
    Blanchette, Marco 
    MAB at stowers.org
       
    Fri Sep  6 02:14:00 CEST 2013
    
    
  
Hi,
Is this a bug of normal behavior... (I suspect it's a bug). I am trying to compute the intergenic regions using GenomicFeatures txdb and GenomicRanges gaps() and the GRanges return contains entry spanning all chromosomes on both strand... On a mock GRanges, I don't get this weird behavior (See below). Any suggestions?
thanks
> library(GenomicFeatures)
### Downloading the txdb from UCSC
> txdb <- makeTranscriptDbFromUCSC(genome="dm3",
                                 tablename="ensGene")
### Getting the transcripts for each genes
> allTx <- transcriptsBy(txdb, by='gene')
### Geting the GRanges for each genes
> genes <- unlist(range(allTx))
### Removing the strand to perform the reduce operation
> strand(genes) <- '*'
### Now, let's find the gaps
> genic <- reduce(genes.noStrand)
### Because of teh reduce, there should be no overlaping genic regions
> length(findOverlaps(genic,ignoreSelf=TRUE))
[1] 0
### Finding the intergenic regions
> intergenic <- gaps(genic)
### The difference of non-overlaping ranges should return non-overlaping gaps, but...
> length(findOverlaps(intergenic,ignoreSelf=TRUE))
[1] 46300
### What's funky is that I am getting gaps spanning the full lenght of each chromosomes
### Why???
> chrs.width <- seqlengths(seqinfo(intergenic))
> intergenic[width(intergenic) == gene2chrsLength]
GRanges with 30 ranges and 0 metadata columns:
        seqnames        ranges strand
           <Rle>     <IRanges>  <Rle>
   [1]     chr2L [1, 23011544]      +
   [2]     chr2L [1, 23011544]      -
   [3]     chr2R [1, 21146708]      +
   [4]     chr2R [1, 21146708]      -
   [5]     chr3L [1, 24543557]      +
   ...       ...           ...    ...
  [26]   chrXHet [1,   204112]      -
  [27]   chrYHet [1,   347038]      +
  [28]   chrYHet [1,   347038]      -
  [29] chrUextra [1, 29004656]      +
  [30] chrUextra [1, 29004656]      -
  ---
  seqlengths:
       chr2L     chr2R     chr3L     chr3R ...   chrXHet   chrYHet chrUextra
    23011544  21146708  24543557  27905053 ...    204112    347038  29004656
### On Fake data
> genes <- GRanges(seqnames=c(rep("2L",3),rep("3L",3)),
                 ranges=IRanges(start=rep(c(20,200,350),2),end=rep(c(80,275,450),2)))
> length(findOverlaps(intergenic,ignoreSelf=TRUE))
[1] 0
> intergenic
GRanges with 6 ranges and 0 metadata columns:
      seqnames     ranges strand
         <Rle>  <IRanges>  <Rle>
  [1]       2L [  1,  19]      *
  [2]       2L [ 81, 199]      *
  [3]       2L [276, 349]      *
  [4]       3L [  1,  19]      *
  [5]       3L [ 81, 199]      *
  [6]       3L [276, 349]      *
  ---
  seqlengths:
   2L 3L
   NA NA
-- 
Marco Blanchette, Ph.D.
Assistant Investigator 
Stowers Institute for Medical Research
1000 East 50th St.
Kansas City, MO 64110
Tel: 816-926-4071 
Cell: 816-726-8419 
Fax: 816-926-2018 
    
    
More information about the Bioconductor
mailing list