[BioC] pathview puzzle

Luo Weijun luo_weijun at yahoo.com
Sat Aug 24 03:53:18 CEST 2013


Hi Oleg,
Thanks for the note. This is indeed a problem I didn’t realize previously! KEGG uses Entrez Gene ID for all other model organisms I’ve checked.
I am working on a generic fix (not only for E coli but other species with similar situation) and will incorporate that into the development version of pathview soon. Will keep you posted.
Thanks for pointing this out.
Weijun


--------------------------------------------
On Fri, 8/23/13, Oleg Moskvin <moskvin at wisc.edu> wrote:

 Subject: Re: [BioC] pathview puzzle

 Date: Friday, August 23, 2013, 12:19 PM

 Hi Weijun,

 Thank you for the response. 

 The problem seems to be deeper than that and is connected to
 special handling of a particular species - E.coli - by KEGG.


 I looked into the pathview() code and here is what I see: 

 1) gene.data is remapped internally via mol.sum() to have
 ENTREZ IDs;
 2) remapped gene.data is used by node.map() to map onto KEGG
 nodes using node.data
 3) the node.data used in (2) was originally extracted from
 the KEGG XML by node.info()

 The above route implies that the "name" entries in the KEGG
 XML of type="gene" have "speciesID:ENTREZ" format...

 And in the case of E.coli this doesn't hold true! See the
 examples of XML entries for H.sapience and E.coli from my
 yesterday's message (below). 

 In fact, in KEGG XML for E.coli "gene" records b-numbers are
 used as IDs! 

 So, for the cases like that, when KEGG fails to be
 consistent in the supplied XML structure, one may suggest
 introducing an "id.bypass" option to pathview() which will
 take the gene.data as is (with the IDs supplied by user that
 match KEGG XML ids; for example, b-numbers), and pass this
 directly to the step 3 (node matching).

 Thanks!

 Oleg



 On 08/22/13, Luo Weijun wrote:
 > Hi Oleg,
 > You are right, the problem is due to ID type
 inconsistency.
 > You have to specify gene.idtype when calling pathview
 function, if your gene id type is not Entrez Gene. I don’t
 think b-numbers are recognized for sure. For your gene name
 example, if you mean official gene symbols by “gene
 name”, you should specify gene.idtype="SYMBOL" (lower case
 is fine):
 > eco2.out <- pathview(gene.data =
 T2.CEBF095.crt115.ASCH.DROP3.rel.gn, pathway.id = "02010",
 gene.idtype="SYMBOL", out.suffix = "T2ACSH", species =
 "eco", kegg.native=TRUE)


 On 08/22/13, Oleg Moskvin  wrote:

 > 
 > <entry id="2" name="hsa:51343" type="gene"
 > link="http://www.kegg.jp/dbget-bin/www_bget?hsa:51343">
 > <graphics name="FZR1, CDC20C, CDH1, FZR, FZR2, HCDH,
 HCDH1" fgcolor="#000000" bgcolor="#BFFFBF"
 > type="rectangle" x="919" y="536" width="46"
 height="17"/>
 > </entry>
 > 
 > 
 > <entry id="4" name="eco:b1513" type="gene"
 > link="http://www.kegg.jp/dbget-bin/www_bget?eco:b1513">
 > <graphics name="lsrA" fgcolor="#000000"
 bgcolor="#BFFFBF"
 > type="rectangle" x="339" y="1882" width="46"
 height="17"/>
 > </entry>



More information about the Bioconductor mailing list