[BioC] hyperGTest, KEGG

Mon Apr 16 16:27:01 CEST 2007

Hi Seth,

Thanks I found one way around my problem (for the non devel version of  
the package Category):

geneIds(hgOver)[geneIds(hgOver) %in%  hgOver at catToGeneId[[i]]]

where [[i]] runs over significant KEGG terms obtained from hyperGTest(params).

Cheers,
Ivan

Quoting Seth Falcon <sfalcon at fhcrc.org>:

> Hi Ivan,
>
> ivan.borozan at utoronto.ca writes:
>> I've used the script below to calculate over-represented KEGG
>> categories however I can not get to gene ID's associated with each of
>> the overrepresented KEGG terms/pathways ?
>
> I've been working on making results easier to work with and also
> improving the documentation.  This is all happening in the devel arm
> (which will soon become the next release).
>
> With a (very) recent version of Category you can get help on all
> accessors for the result objects returned by hyperGTest:
>
>      help("HyperGResult-accessors")
>
>> My question, does catToGeneId() exist and how do I get to genes that
>> are associated with each of the above pathways ?
>
> To get the category to universe of gene IDs mapping:
>
>     > geneIdUniverse(ans)[1:2]
>     $`00625`
>      [1] "YCR105W" "YCR107W" "YDL243C" "YDR368W" "YFL056C" "YHR104W"  
>  "YJR155W"
>      [8] "YKR009C" "YNL331C" "YOR120W"
>
>     $`04010`
>      [1] "YAL041W" "YBL016W" "YBL105C" "YBR083W" "YBR200W" "YCL027W"  
>  "YDL159W"
>      [8] "YDL235C" "YDR103W" "YDR461W" "YDR480W" "YER111C" "YER118C"  
>  "YFL026W"
>     [15] "YGL089C" "YGR032W" "YGR040W" "YGR088W" "YHL007C" "YHR005C"  
>  "YHR030C"
>     [22] "YHR084W" "YIL147C" "YJL095W" "YJL128C" "YJL157C" "YJR086W"  
>  "YKL062W"
>     [29] "YKL178C" "YKR095W" "YLR006C" "YLR113W" "YLR182W" "YLR229C"  
>  "YLR332W"
>     [36] "YLR342W" "YLR362W" "YML004C" "YMR037C" "YMR043W" "YNL053W"  
>  "YNL098C"
>     [43] "YNL145W" "YNL271C" "YNL283C" "YNR031C" "YOL105C" "YOR008C"  
>  "YOR212W"
>     [50] "YOR231W" "YPL049C" "YPL089C" "YPL140C" "YPL187W" "YPR165W"
>
> To get the category to _selected_ gene IDs mapping:
>
>     > geneIdsByCategory(ans)[1:2]
>     $`00625`
>     [1] "YOR120W"
>
>     $`04010`
>     [1] "YFL026W" "YLR342W"
>
> The number of selected genes in each category (just the length of each
> element of the above):
>
>     > geneCounts(ans)[1:2]
>     00625 04010
>         1     2
>
> NOTE: I used the YEAST annotation data package as an example.  It is
> non-typical in that it does not use Entrez Gene IDs as the base
> identifier.  For your example, you will get Entrez IDs and you can map
> those to SYMBOL if you want using the appropriate annotation data
> package.
>
> The above examples were done using:
>
> R 2.5.0 beta, Category 2.1.36
>
> sessionInfo()
> R version 2.5.0 beta (--)
> powerpc-apple-darwin8.9.0
>
> locale:
> C
>
> attached base packages:
> [1] "splines"   "tools"     "stats"     "graphics"  "grDevices" "datasets"
> [7] "utils"     "methods"   "base"
>
> other attached packages:
>         YEAST      Category AnnotationDbi       RSQLite           DBI
>     "1.15.13"      "2.1.36"      "0.0.58"       "0.5-4"       "0.2-1"
>        Matrix       lattice    genefilter      survival      annotate
>   "0.9975-11"      "0.15-3"     "1.13.12"        "2.31"      "1.13.7"
>            GO          KEGG         graph       Biobase
>     "1.15.13"     "1.15.13"     "1.13.10"     "1.13.48"
>
> Hope that helps.
>
>  + seth
>
> --
> Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center
> http://bioconductor.org
>