[BioC] KEGG overrepresentation loses genes

Anne Kupczok anne.kupczok at univie.ac.at
Wed Apr 14 17:04:44 CEST 2010


Hello,
I observed the following problem when using the KEGG annotation with 
hyperGTest: Somehow hyperGTest does not consider all genes. In the 
example below, all three genes are in the category "05020" (this is what 
mget(genes,envir=org.Hs.egPATH) says). In the summary of hyperGTest, 
however, the category contains only two genes.
Is there an explanation of this behavior?
Thanks in advance!
Anne

 > library("Category")
Loading required package: AnnotationDbi
Loading required package: Biobase

Welcome to Bioconductor

  Vignettes contain introductory material. To view, type
  'openVignette()'. To cite Bioconductor, see
  'citation("Biobase")' and for packages 'citation(pkgname)'.

 > library("org.Hs.eg.db")
Loading required package: DBI
 > genes=c("1958","3553","3303")
 >
 > 
GoHyp=new("KEGGHyperGParams",geneIds=genes,annotation="org.Hs.eg",pvalueCutoff=1,testDirection="over")
 > htest=hyperGTest(GoHyp)
 > s=summary(htest)
 > s[1,]
  KEGGID       Pvalue OddsRatio    ExpCount Count Size           Term
1  05020 3.810228e-06       Inf 0.003960844     2   35 Prion diseases
 >
 > p=mget(genes,envir=org.Hs.egPATH,ifnotfound=NA)
 > p
$`1958`
[1] "05020"

$`3553`
[1] "04010" "04060" "04210" "04620" "04640" "04940" "05010" "05020" "05332"

$`3303`
[1] "04010" "04144" "04612" "05020"

 > geneIdsByCategory(htest,"05020")
$`05020`
[1] "1958" "3553"

 > sessionInfo()
R version 2.10.0 (2009-10-26)
x86_64-unknown-linux-gnu

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] KEGG.db_2.3.5       org.Hs.eg.db_2.3.6  RSQLite_0.7-3
[4] DBI_0.2-4           Category_2.12.0     AnnotationDbi_1.8.1
[7] Biobase_2.6.0

loaded via a namespace (and not attached):
 [1] annotate_1.24.0   genefilter_1.28.0 graph_1.24.1      GSEABase_1.8.0
 [5] RBGL_1.22.0       splines_2.10.0    survival_2.35-7   tools_2.10.0
 [9] XML_2.6-0         xtable_1.5-6
 >



More information about the Bioconductor mailing list