[BioC] ensembl annotation coordinate did not match that from UCSC genome browser using ucscTableQuery

Steve Lianoglou mailinglist.honeypot at gmail.com
Thu Mar 4 17:07:28 CET 2010


Hi,

On Thu, Mar 4, 2010 at 10:41 AM, sabrina s <sabrina.shao at gmail.com> wrote:
> Hi, all
> I don't know if it is just by chance, I was retrieving sequence for
> ENSMUST00000027587<http://www.ensembl.org/Mus_musculus/Transcript/Exons?g=ENSMUSG00000026349;t=ENSMUST00000027587>using
> BSgenome
> the coordinate I use was what  I retrieved from UCSC through following code:
>
>  library(rtracklayer)
>     session <- browserSession()
>     genome(session) <- "mm9"
>
> q2<- ucscTableQuery(session,"
> ensGene")
> ensGene<-getTable(q2)
>
> the result is:
>  name name2 chrom strand   txStart     txEnd
> 980 NM_028399 Ccnt2  chr1      + 129670740 129701414
>
> exonStarts
> 980
> 129670740,129671677,129688181,129689934,129691831,129694417,129695966,129698182,129698738,
>
> exonEnds exonCount
> 980
> 129670962,129671759,129688310,129689995,129691894,129694463,129696130,129698253,129701414,
> 9
>
>
> But from Ensembl or even UCSC genome browser, the first exon coordinate
> starts at  129670741, so there is 1 bp shift.

Look at the description of how the "coordinates" work as supplied by UCSC:

http://genome.ucsc.edu/FAQ/FAQtracks#tracks1

> Because of that, I can't get
> the right sequence that I need. So there is anyway to correct that or am I
> missing some steps? Thanks!

You can get what you need, you just hat to know when you need to add
or subtract 1 from the start position.

Hope that helps,

-steve

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact



More information about the Bioconductor mailing list