[BioC] How to check if gene name is an alias or misspelt
    Hervé Pagès 
    hpages at fhcrc.org
       
    Sat Apr 11 02:03:36 CEST 2009
    
    
  
Hi Dan,
The org.XX.egALIAS2EG map combined with some fuzzy matching
function can help you do this:
   > library(org.Hs.eg.db)
   > get("S-HT3c2", org.Hs.egALIAS2EG)
   Error in .checkKeys(value, Rkeys(x), x at ifnotfound) :
     value for "S-HT3c2" not found
   > agrep("S-HT3c2", keys(org.Hs.egALIAS2EG), value=TRUE, max.distance=1)
   [1] "5-HT3c2"
The 'max.distance argument' lets you control the max number of misspelling
letters (including inserted/deleted letters):
   > get("WUGSC:H-DJO747G182", org.Hs.egALIAS2EG)
   Error in .checkKeys(value, Rkeys(x), x at ifnotfound) :
     value for "WUGSC:H-DJO747G182" not found
   > agrep("WUGSC:H-DJO747G182", keys(org.Hs.egALIAS2EG), value=TRUE, max.distance=2)
   character(0)
   > agrep("WUGSC:H-DJO747G182", keys(org.Hs.egALIAS2EG), value=TRUE, max.distance=3)
   [1] "WUGSC:H_DJ0747G18.2"
Cheers,
H.
Daniel Brewer wrote:
> Hello,
> 
> I have a list of genes which are not official gene symbols.  Normally in
> this case I would search gene in entrez to see if it is an alias and
> then take the official symbol.  Is there a way to (semi) automate this
> within bioconductor?
> 
> If this fails I normally google it to see if it is likely to be a
> misspelling S instead of 5 etc.  ANy suggestions for that?
> 
> Many thanks
> 
> Dan
> 
-- 
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319
    
    
More information about the Bioconductor
mailing list