[BioC] RefSeq coordinates from biomaRt
Dave Tang
davetingpongtang at gmail.com
Mon Nov 25 09:47:23 CET 2013
Hello,
I've been using biomaRt to fetch genomic coordinates of RefSeqs (perhaps
in an incorrect manner?). I found that the RefSeq coordinates don't match
the coordinates provided at the UCSC Genome Browser (NM_033453 at
chr20:3190006-3204516):
library("biomaRt")
ensembl <- useMart("ensembl", dataset="hsapiens_gene_ensembl")
getBM(attributes=c('refseq_mrna','chromosome_name','start_position','end_position','strand'),
filters = 'refseq_mrna', values = 'NM_033453', mart = ensembl)
refseq_mrna chromosome_name start_position end_position strand
1 NM_033453 20 3189514 3204516 1
The coordinates seem to match this Ensembl transcript (ENST00000483354)
instead:
getBM(attributes=c('ensembl_transcript_id','chromosome_name','start_position','end_position','strand'),
filters = 'ensembl_transcript_id', values = 'ENST00000483354', mart =
ensembl)
ensembl_transcript_id chromosome_name start_position end_position
strand
1 ENST00000483354 20 3189514 3204516 1
Here's another RefSeq model, NM_181493, which should be mapped to
chr20:3190134-3204516:
getBM(attributes=c('refseq_mrna','chromosome_name','start_position','end_position','strand'),
filters = 'refseq_mrna', values = 'NM_181493', mart = ensembl)
refseq_mrna chromosome_name start_position end_position strand
1 NM_181493 20 3189514 3204516 1
So it seems the RefSeq IDs are mapped to the longest Ensembl transcript
model that covers the RefSeq model. I searched around the web and looked
at different available marts but nothing obvious popped out. How should I
go about obtaining RefSeq coordinates using biomaRt? Or biomaRt is Ensembl
centric?
sessionInfo()
R version 3.0.2 (2013-09-25)
Platform: x86_64-w64-mingw32/x64 (64-bit)
locale:
[1] LC_COLLATE=English_Australia.1252 LC_CTYPE=English_Australia.1252
LC_MONETARY=English_Australia.1252
[4] LC_NUMERIC=C LC_TIME=English_Australia.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] biomaRt_2.16.0
loaded via a namespace (and not attached):
[1] RCurl_1.95-4.1 tools_3.0.2 XML_3.98-1.1
Cheers,
--
Dave
More information about the Bioconductor
mailing list