[BioC] remove NA from named character vector
    Iain Gallagher 
    iaingallagher at btopenworld.com
       
    Fri Jul 22 13:03:39 CEST 2011
    
    
  
Hi List
This is likely a trivial problem but it's annoying me. I am mapping from Bos taurus ensembl ids to symbols. I can do this in biomaRt but use of the org.Bt.eg.db package means I'm not tied to an internet connection. 
A toy example:
library(org.Bt.eg.db)
ens <- c('ENSBTAG00000004218', 'ENSBTAG00000004270', 'ENSBTAG00000004578', 'ENSBTAG00000004608')
egs <- unlist(mget(ens, revmap(org.Bt.egENSEMBL), ifnotfound=NA))
egs
ENSBTAG00000004218 ENSBTAG00000004270 ENSBTAG00000004578 ENSBTAG00000004608 
          "617660"           "407106"                 NA        "100138951" 
# a named character vector with one NA
#now get symbols
syms <- unlist(mget(egs, org.Bt.egSYMBOL, ifnotfound=NA))
#throws and error - fair enough - need to drop the NA
which(egs == NA)
#gives named integer(0) - hmm
class(egs)
#gives [1] "character" - so I'm quite confused now.
NA %in% egs
#gives [1] TRUE
How do I identify which entries in 'egs' are NA so I can remove them? It's trivial here but the dataset I'm working with is in the thousands.
Thanks
iain
> sessionInfo()
R version 2.13.1 (2011-07-08)
Platform: x86_64-pc-linux-gnu (64-bit)
locale:
 [1] LC_CTYPE=en_GB.utf8       LC_NUMERIC=C             
 [3] LC_TIME=en_GB.utf8        LC_COLLATE=en_GB.utf8    
 [5] LC_MONETARY=C             LC_MESSAGES=en_GB.utf8   
 [7] LC_PAPER=en_GB.utf8       LC_NAME=C                
 [9] LC_ADDRESS=C              LC_TELEPHONE=C           
[11] LC_MEASUREMENT=en_GB.utf8 LC_IDENTIFICATION=C      
attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     
other attached packages:
[1] org.Bt.eg.db_2.5.0   RSQLite_0.9-4        DBI_0.2-5           
[4] AnnotationDbi_1.14.1 Biobase_2.10.0      
    
    
More information about the Bioconductor
mailing list