[BioC] RefSeq coordinates from biomaRt
Dave Tang
davetingpongtang at gmail.com
Mon Nov 25 13:39:12 CET 2013
On Mon, 25 Nov 2013 19:31:22 +0900, Sean Davis <sdavis2 at mail.nih.gov>
wrote:
> Hi, Dave.
>
> There may be multiple issues going on here, so you'll have to do some
> digging yourself when discrepancies arise like you see here. Working
> through your first example, keep in mind that neither Ensembl or UCSC
> are the actual curators of the RefSeq transcripts. NCBI is the source of
> that annotation. So, if you go to NCBI gene and search for NM_033453 and
> then play a bit with the Genomic Sequence Viewer, you'll note that the
> Gene (protein NP_258412.1) is mapped with the coordinates given at UCSC
> while the mRNA is mapped with the coordinates given by Ensembl. Add to
> this complication that UCSC does its own mapping of the transcripts
> (even RefSeq) and you could even have a "unique" set of coordinates
> given by UCSC (ie., not the same as NCBI or Ensembl).
Hi Sean,
thank you for the prompt reply.
My aim is to have a set of transcript annotations as opposed to gene
annotations; I don't really mind whether they are RefSeqs or Ensembl
transcript models. But I keep running into the same problem where the
coordinates of either Ensembl or RefSeq transcripts are the coordinates of
the Ensembl gene that encompasses all the transcripts, i.e. the longest
Ensembl gene. Here's another example:
library("biomaRt")
ensembl <- useMart("ensembl",dataset="hsapiens_gene_ensembl")
#ENST00000398344 is on chr22:24,313,554-24,316,773
getBM(attributes = c('chromosome_name',
'start_position',
'end_position',
'strand'
),
filters = 'ensembl_transcript_id',
values = 'ENST00000398344',
mart = ensembl)
chromosome_name start_position end_position strand
1 22 24313554 24322660 -1
#ENST00000430101 is on chr22:24,315,293-24,316,648
getBM(attributes = c('chromosome_name',
'start_position',
'end_position',
'strand'
),
filters = 'ensembl_transcript_id',
values = 'ENST00000430101',
mart = ensembl)
chromosome_name start_position end_position strand
1 22 24313554 24322660 -1
Is it possible to obtain genomic coordinates of Ensembl transcript via
biomaRt?
sessionInfo()
R version 3.0.2 (2013-09-25)
Platform: x86_64-w64-mingw32/x64 (64-bit)
locale:
[1] LC_COLLATE=English_Australia.1252 LC_CTYPE=English_Australia.1252
LC_MONETARY=English_Australia.1252
[4] LC_NUMERIC=C LC_TIME=English_Australia.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] biomaRt_2.18.0
loaded via a namespace (and not attached):
[1] RCurl_1.95-4.1 tools_3.0.2 XML_3.98-1.1
Cheers,
--
Dave
More information about the Bioconductor
mailing list