[BioC] reasonable Illumina hyperG test

Fri Sep 5 07:18:00 CEST 2008

Hi,
I have been looking around at examples of the hyperGTest (in the 
GOstats, lumi, and other documentation) and feel like I have seen many 
slight variations on the methodology.
These variations are usually found in the way the non-specific filtering 
is performed. I haven't come across many examples of a hyperGTest for 
KEGG pathways and would like to ask whether my approach seems reasonable 
or whether I should make any changes.
Here is my code ("sig" is a vector of EntrezID):

uni = exprs(lumi.N.P)

#Remove those without PATH annotation
havePATH = sapply(mget(allFeatures, lumiHumanAllPATH),
function(x){
    if (length(x) == 1 && is.na(x))
    FALSE
    else TRUE
})
uni <- uni[names(which(havePATH == TRUE)),]

#Remove those with little variation accross samples
iqrCutoff = 0.5
uni.IQR = apply(uni, 1, IQR)
uni = uni[which((uni.IQR > iqrCutoff) == TRUE),]

#Keep probes w/largest IQR
uni = uni[findLargest(rownames(uni), uni.IQR[rownames(uni)], 
"lumiHumanAll"),]
uni = mget(rownames(uni), lumiHumanAllENTREZID)

params = new("KEGGHyperGParams", geneIds=sig, universeGeneIds = uni, 
annotation="lumiHumanAll", pvalueCutoff=0.05, testDirection="over")

hgOver = hyperGTest(params)

Does this code/approach seem reasonable? Should I correct for multiple 
testing after the hyperGTest?
Would it be fair to perform a test on gene ontologies in teh same way 
(obviously after having changed the param type and specifying an 
ontology branch)?

thanks,
Sebastien