[BioC] biomaRt ensembl mmusculus does not contain all ensembl IDs (lincRNA, miRNA etc)?
Rhoda Kinsella
rhoda at ebi.ac.uk
Tue Apr 19 11:34:48 CEST 2011
Hi Steffen and Duke
The issue here is that the entrezgene external references are
currently stored on translations and as only one of the transcripts
you uploaded has a translation, this is the only one that you will get
back when you add the entrezgene attribute. Basically the entrezgene
attribute is acting like a filter, which is not ideal. Unfortunately
we cannot do anything about this problem at the moment as the BioMart
tool we use to build the mart does not allow the addition of a
necessary left join. We have informed the BioMart developers at the
OICR about this issue and hopefully it will be fixed in the new code.
On the plus side, the entrezgene IDs will be stored on genes for
release 63 (due approx end of June) so you should be able to use this
attribute in the expected way after the next release. I apologize for
any inconvenience that this has caused. If I can be of further
assistance, please let me know.
Regards
Rhoda
On 18 Apr 2011, at 21:41, Steffen Durinck wrote:
> Hi Duke,
>
> It looks like this is a BioMart server issue where the wrong type of
> table join is made with the entezgene table.
> If you remove the entrezgene attribute you'll get everything back:
>
>> getBM(filters="ensembl_transcript_id",
>> attributes
>> =
>> c
>> ("ensembl_transcript_id
>> ","ensembl_gene_id","external_transcript_id","refseq_dna"),
>> values=ensTransIDs,mart= mart)
> ensembl_transcript_id ensembl_gene_id external_transcript_id
> refseq_dna
> 1 ENSMUST00000000001 ENSMUSG00000000001 Gnai3-001
> NM_010306
> 2 ENSMUST00000042585 ENSMUSG00000037982 Gm9725-201
> 3 ENSMUST00000083463 ENSMUSG00000065397 Mir155-201
> NR_029565
>
>
> We notified the BioMart team of this behavior a while ago and they
> would make a change in the next release.
>
> Cheers,
> Steffen
>
>
>
> On Mon, Apr 18, 2011 at 1:33 PM, Duke <duke.lists at gmx.com> wrote:
>> Hi folks,
>>
>> Following instruction of biomaRt usage, I am trying to get
>> information for
>> our mmu data. The code I used was below:
>>
>> ----------
>> library(biomaRt)
>> mart<- useDataset("mmusculus_gene_ensembl", useMart("ensembl"))
>> ensTransIDs <- c("ENSMUST00000000001",
>> "ENSMUST00000083463","ENSMUST00000042585")
>> getBM(filters="ensembl_transcript_id",
>> attributes=c("ensembl_transcript_id","ensembl_gene_id",
>> "external_transcript_id", "external_gene_id", "refseq_dna",
>> "entrezgene"),
>> values=ensTransIDs,mart= mart)
>> ----------
>>
>> This code runs fine with some transcript_ids, but for some of
>> others (for
>> example, lincRNAs or miRNAs), it gave empty results. For example,
>> the code
>> above for one gene, one lincRNA and one miRNA produced result:
>>
>> ensembl_transcript_id ensembl_gene_id external_transcript_id
>> 1 ENSMUST00000000001 ENSMUSG00000000001 Gnai3-001
>> external_gene_id refseq_dna entrezgene
>> 1 Gnai3 NM_010306 14679
>>
>>
>> => only gene Gnai3 is detected, the other two are not.
>>
>> Anybody knows what I am doing wrong here, or it is just the
>> database in
>> ensembl does not contain all the available transcript_id data?
>>
>> For the record, here is my sessionInfo():
>>
>>> sessionInfo()
>> R version 2.12.2 (2011-02-25)
>> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
>>
>> locale:
>> [1] C
>>
>> attached base packages:
>> [1] stats graphics grDevices utils datasets methods base
>>
>> other attached packages:
>> [1] biomaRt_2.6.0
>>
>> loaded via a namespace (and not attached):
>> [1] RCurl_1.4-3 XML_3.2-0 tools_2.12.2
>>
>> Thanks,
>>
>> D.
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
Rhoda Kinsella Ph.D.
Ensembl Bioinformatician,
European Bioinformatics Institute (EMBL-EBI),
Wellcome Trust Genome Campus,
Hinxton
Cambridge CB10 1SD,
UK.
More information about the Bioconductor
mailing list