[BioC] biomaRt ensembl mmusculus does not contain all ensembl IDs (lincRNA, miRNA etc)?

Steffen Durinck durinck.steffen at gene.com
Mon Apr 18 22:41:51 CEST 2011


Hi Duke,

It looks like this is a BioMart server issue where the wrong type of
table join is made with the entezgene table.
If you remove the entrezgene attribute you'll get everything back:

> getBM(filters="ensembl_transcript_id", attributes=c("ensembl_transcript_id","ensembl_gene_id","external_transcript_id","refseq_dna"), values=ensTransIDs,mart= mart)
  ensembl_transcript_id    ensembl_gene_id external_transcript_id refseq_dna
1    ENSMUST00000000001 ENSMUSG00000000001              Gnai3-001  NM_010306
2    ENSMUST00000042585 ENSMUSG00000037982             Gm9725-201
3    ENSMUST00000083463 ENSMUSG00000065397             Mir155-201  NR_029565


We notified the BioMart team of this behavior a while ago and they
would make a change in the next release.

Cheers,
Steffen



On Mon, Apr 18, 2011 at 1:33 PM, Duke <duke.lists at gmx.com> wrote:
> Hi folks,
>
> Following instruction of biomaRt usage, I am trying to get information for
> our mmu data. The code I used was below:
>
> ----------
> library(biomaRt)
> mart<- useDataset("mmusculus_gene_ensembl", useMart("ensembl"))
> ensTransIDs <- c("ENSMUST00000000001",
> "ENSMUST00000083463","ENSMUST00000042585")
> getBM(filters="ensembl_transcript_id",
> attributes=c("ensembl_transcript_id","ensembl_gene_id",
> "external_transcript_id", "external_gene_id", "refseq_dna", "entrezgene"),
> values=ensTransIDs,mart= mart)
> ----------
>
> This code runs fine with some transcript_ids, but for some of others (for
> example, lincRNAs or miRNAs), it gave empty results. For example, the code
> above for one gene, one lincRNA and one miRNA produced result:
>
>  ensembl_transcript_id    ensembl_gene_id external_transcript_id
> 1    ENSMUST00000000001 ENSMUSG00000000001              Gnai3-001
>  external_gene_id refseq_dna entrezgene
> 1            Gnai3  NM_010306      14679
>
>
> => only gene Gnai3 is detected, the other two are not.
>
> Anybody knows what I am doing wrong here, or it is just the database in
> ensembl does not contain all the available transcript_id data?
>
> For the record, here is my sessionInfo():
>
>> sessionInfo()
> R version 2.12.2 (2011-02-25)
> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
>
> locale:
> [1] C
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] biomaRt_2.6.0
>
> loaded via a namespace (and not attached):
> [1] RCurl_1.4-3  XML_3.2-0    tools_2.12.2
>
> Thanks,
>
> D.
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>



More information about the Bioconductor mailing list