[BioC] biomaRt ensembl mmusculus does not contain all ensembl IDs (lincRNA, miRNA etc)?

Rhoda Kinsella rhoda at ebi.ac.uk
Tue Apr 19 11:34:48 CEST 2011


Hi Steffen and Duke
The issue here is that the entrezgene external references are  
currently stored on translations and as only one of the transcripts  
you uploaded has a translation, this is the only one that you will get  
back when you add the entrezgene attribute. Basically the entrezgene  
attribute is acting like a filter, which is not ideal. Unfortunately  
we cannot do anything about this problem at the moment as the BioMart  
tool we use to build the mart does not allow the addition of a  
necessary left join. We have informed the BioMart developers at the  
OICR about this issue and hopefully it will be fixed in the new code.  
On the plus side, the entrezgene IDs will be stored on genes for  
release 63 (due approx end of June) so you should be able to use this  
attribute in the expected way after the next release. I apologize for  
any inconvenience that this has caused. If I can be of further  
assistance, please let me know.
Regards
Rhoda


On 18 Apr 2011, at 21:41, Steffen Durinck wrote:

> Hi Duke,
>
> It looks like this is a BioMart server issue where the wrong type of
> table join is made with the entezgene table.
> If you remove the entrezgene attribute you'll get everything back:
>
>> getBM(filters="ensembl_transcript_id",  
>> attributes 
>> = 
>> c 
>> ("ensembl_transcript_id 
>> ","ensembl_gene_id","external_transcript_id","refseq_dna"),  
>> values=ensTransIDs,mart= mart)
>  ensembl_transcript_id    ensembl_gene_id external_transcript_id  
> refseq_dna
> 1    ENSMUST00000000001 ENSMUSG00000000001              Gnai3-001   
> NM_010306
> 2    ENSMUST00000042585 ENSMUSG00000037982             Gm9725-201
> 3    ENSMUST00000083463 ENSMUSG00000065397             Mir155-201   
> NR_029565
>
>
> We notified the BioMart team of this behavior a while ago and they
> would make a change in the next release.
>
> Cheers,
> Steffen
>
>
>
> On Mon, Apr 18, 2011 at 1:33 PM, Duke <duke.lists at gmx.com> wrote:
>> Hi folks,
>>
>> Following instruction of biomaRt usage, I am trying to get  
>> information for
>> our mmu data. The code I used was below:
>>
>> ----------
>> library(biomaRt)
>> mart<- useDataset("mmusculus_gene_ensembl", useMart("ensembl"))
>> ensTransIDs <- c("ENSMUST00000000001",
>> "ENSMUST00000083463","ENSMUST00000042585")
>> getBM(filters="ensembl_transcript_id",
>> attributes=c("ensembl_transcript_id","ensembl_gene_id",
>> "external_transcript_id", "external_gene_id", "refseq_dna",  
>> "entrezgene"),
>> values=ensTransIDs,mart= mart)
>> ----------
>>
>> This code runs fine with some transcript_ids, but for some of  
>> others (for
>> example, lincRNAs or miRNAs), it gave empty results. For example,  
>> the code
>> above for one gene, one lincRNA and one miRNA produced result:
>>
>>  ensembl_transcript_id    ensembl_gene_id external_transcript_id
>> 1    ENSMUST00000000001 ENSMUSG00000000001              Gnai3-001
>>  external_gene_id refseq_dna entrezgene
>> 1            Gnai3  NM_010306      14679
>>
>>
>> => only gene Gnai3 is detected, the other two are not.
>>
>> Anybody knows what I am doing wrong here, or it is just the  
>> database in
>> ensembl does not contain all the available transcript_id data?
>>
>> For the record, here is my sessionInfo():
>>
>>> sessionInfo()
>> R version 2.12.2 (2011-02-25)
>> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
>>
>> locale:
>> [1] C
>>
>> attached base packages:
>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>
>> other attached packages:
>> [1] biomaRt_2.6.0
>>
>> loaded via a namespace (and not attached):
>> [1] RCurl_1.4-3  XML_3.2-0    tools_2.12.2
>>
>> Thanks,
>>
>> D.
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

Rhoda Kinsella Ph.D.
Ensembl Bioinformatician,
European Bioinformatics Institute (EMBL-EBI),
Wellcome Trust Genome Campus,
Hinxton
Cambridge CB10 1SD,
UK.



More information about the Bioconductor mailing list