[BioC] BiomaRt Ensembl RefSeq query error

Georg Otto georg.otto at imm.ox.ac.uk
Tue Jan 21 13:48:52 CET 2014


as an amendment to my previous post, here is the sessionInfo():

R version 3.0.1 (2013-05-16)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8    
 [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8   
 [7] LC_PAPER=C                 LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] biomaRt_2.18.0

loaded via a namespace (and not attached):
 [1] annotate_1.40.0      AnnotationDbi_1.24.0 Biobase_2.22.0      
 [4] BiocGenerics_0.8.0   compiler_3.0.1       DBI_0.2-7           
 [7] DESeq_1.14.0         genefilter_1.44.0    geneplotter_1.40.0  
[10] grid_3.0.1           IRanges_1.20.6       lattice_0.20-24     
[13] parallel_3.0.1       RColorBrewer_1.0-5   RCurl_1.95-4.1      
[16] RSQLite_0.11.4       splines_3.0.1        stats4_3.0.1        
[19] survival_2.37-4      tools_3.0.1          XML_3.98-1.1        
[22] xtable_1.7-1        



Georg Otto <georg.otto at imm.ox.ac.uk> writes:

> Dear Bioconductors,
>
> I am trying to query 14005 Ensembl gene IDs for their Refseq annotations
> using this code (I can send the gene IDs upon request):
>
> ensembl <- useMart("ensembl", dataset = 'mmusculus_gene_ensembl')
>
> getBM(attributes = c("ensembl_gene_id",
>                       "refseq_mrna"), filter="ensembl_gene_id",
>                     ensembl.ids,
>                     mart = ensembl, uniqueRows = TRUE)
>
>
> If I query for the full gene set, many RefSeq IDs are missing (NA), for
> example for the gene ENSMUSG00000000567 (sox9), whereas if I query for a
> subset, say ensembl.ids[1:12000], all the RefSeq IDs are there. It does
> not seem to matter which subset I use, but the size of the subset has to
> be smaller than ca. 12000 genes.
>
> Any idea what is going on?
>
> Best wishes,
>
> Georg
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list