[BioC] how to use rtracklayer to retrieve sequence of list of coordinates

Steve Lianoglou mailinglist.honeypot at gmail.com
Mon Mar 1 20:31:37 CET 2010


Hi Sabrina,

On Mon, Mar 1, 2010 at 2:04 PM, sabrina s <sabrina.shao at gmail.com> wrote:
> Hi,all:
> I have a long list of coordinates, which was created by the following code:
> testIranges<- IRanges(start=currCoorList $starts,
>        end=currCoorLis$ends)
>
>   testTrack<-GenomicData(testIranges,chrom=currCoorList$chr,
>        strand=currCoorList$strand,genome="mm9")
>
> I could then export the testTrack into .bed file and upload in the UCSC
> genome browser and get the sequences of the listed coordinate
> But I was thinking if there is a function from rtracklayer to retrieve
> sequence directly.

If you're trying to get coordinates from some reference genome, you
can use the BSgenome.*.* packages and skip rtracklayer + internet
queries entirely.

For instance, say I have an IRanges object `r1` which is a set of
intervals on chromosome 1 from hg18 that I want sequence information
for, I could do it like this:

R> r1
IRanges of length 589
        start      end width
[1]   18016124 18016155    32
[2]   18020749 18020780    32
[3]   18024599 18024630    32
[4]   18024830 18024861    32
[5]   18051985 18052016    32
[6]   18064760 18064791    32
[7]   18088145 18088176    32
[8]   18088200 18088231    32
[9]   18110085 18110116    32
...

R> library(BSgenome.Hsapiens.UCSC.hg18)
R> chr1 <- unmasked(HSapiens$chr1)
R> myseqs <- Views(chr1, start=start(r1), end=end(r1))
R> myseqs
 Views on a 247249719-letter DNAString subject
subject: TAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAA...NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
views:
       start      end width
[1] 18016124 18016155    32 [GTTTTTTGTTTTGTTTTGTTTTTTGTTTTTTA]
[2] 18020749 18020780    32 [CGCCTAGTCTGGAGTGCAGTGGCACAATCTTG]
[3] 18024599 18024630    32 [TCTCACCTCTTCACCGAAGTGGTCTTCTGAAC]
[4] 18024830 18024861    32 [GAATTCTCCCTCGTTGCAGATCCTGTTTGAGT]
[5] 18051985 18052016    32 [TTATTTTTTTTGAGGAAGAGTCTTGCTCTGTC]
[6] 18064760 18064791    32 [GTGGGAGGATCGCTTGAGCCCTGGAGGTTGGG]
[7] 18088145 18088176    32 [ATCATGGTGGGAGATGAAAGGCACTTCTTACA]
[8] 18088200 18088231    32 [GAAGAAGCAAAAGCGGAAACCCCTGCTAAACC]
[9] 18110085 18110116    32 [GGATCATGAGTTCAAGAGTTCGAGACCAGCCT]
[10] 18118454 18118485    32 [TGTTGCCTAGGCTGGTCTCAAACTCCTGCCCT]
...

calling "as.character" on the myseqs object will give you the sequence
as a character vector.

Does that get you what you're after?
-steve

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact



More information about the Bioconductor mailing list