[BioC] IRanges package: findOverlaps on blobs

Hervé Pagès hpages at fhcrc.org
Fri Jun 17 07:29:49 CEST 2011


On 11-06-08 01:56 PM, Hervé Pagès wrote:
> Hi Fahim,
[...]
> So it looks like the first thing you might want to do is to import
> your file into a GRangesList object. Which can be done with something
> like:
>
> library(GenomicRanges)
> refseqs <- read.table("RefSeqs.txt", header=TRUE,
> stringsAsFactors=FALSE)
> starts <- strsplitAsListOfIntegerVectors(refseqs$targetStart)
> widths <- strsplitAsListOfIntegerVectors(refseqs$blockSizes)
> ranges <- IRanges(start=unlist(starts), width=unlist(widths))
> seqnames <- Rle(factor(refseqs$targetName), elementLengths(starts))
> strand <- Rle(strand(refseqs$strand), elementLengths(starts))
> gr <- GRanges(seqnames, ranges, strand)
> grl <- split(gr, rep.int(seq_len(length(starts)),
> elementLengths(starts)))
> names(grl) <- refseqs$RefSeqID

FWIW, I've added an utility function to the devel version of the
GenomicRanges package that takes care of making a GRangesList object
from this type of input:

   library(GenomicRanges)
   refseqs <- read.table("RefSeqs.txt", header=TRUE,
                         stringsAsFactors=FALSE)
   grl <- with(refseqs,
               makeGRangesListFromFeatureFragments(
                   seqnames=targetName,
                   fragmentStarts=targetStart,
                   fragmentWidths=blockSizes,
                   strand=strand))

Sorry for the long and ugly name... (suggestions welcome).

Cheers,
H.

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioconductor mailing list