[BioC] gsea (gene set enrichment analysis) for ranked lists

Luo Weijun luo_weijun at yahoo.com
Wed Apr 20 22:10:54 CEST 2011


Hi Asta,
I just came across your post. If I understand correctly, my gage package (with a supportive data package gageData) will do the analysis with all your criteria. Here are an example run:

library(gage)
library(gageData)
data(gse16873)
#gene sets data available for other species, type in ?kegg.gs
data(kegg.gs)

#generate some pre-ranked gene expression data as you may have
a= gse16873[,c(2,4)] - gse16873[,c(1,3)]
a=apply(a, 2, rank)

#test with direction: either up or down
kegg.rk <- gage(a, gsets = kegg.gs, ref = NULL, samp = NULL, rank.test=T)
names(kegg.rk)
head(kegg.rk$greater)
#note that your don’t need to pre-rank your genes to do rank.test with gage, the following line would give you the same results as kegg.rk above
kegg.rk.2 <- gage(gse16873, gsets = kegg.gs, ref = c(1,3), samp = c(2,4), rank.test=T)

#test without direction, i.e. 2-way perturbations
kegg.rk.2d <- gage(gse16873, gsets = kegg.gs, ref = c(1,3), samp = c(2,4), rank.test=T, same.dir=F)
names(kegg.rk.2d)
head(kegg.rk.2d$greater)

There are many other options gage provide for gene set test, check the package vignette or type in ?gage for details.

If your pre-ranked gene data is a vector (or a single-column matrix), you need to create a single-column matrix (with column name) using cbind. First of all you need to update to the development version of gage package (due to a small bug).

#generate pre-ranked data vector
a= gse16873[,c(2,4)] - gse16873[,c(1,3)]
a=apply(a, 1, mean)
a=rank(a)
kegg.p <- gage(cbind(exp1=a), gsets = kegg.gs, ref = NULL, samp = NULL)
names(kegg.p)
head(kegg.p$greater)

This line above for single-column data GAGE analysis will NOT work with current release version 2.2.x) of gage as there is a small bug I just fixed. You need to download and install the development version at http://bioconductor.org/packages/2.9/bioc/html/gage.html sometime tomorrow when the daily check-build cycle is done. Hope this helps.
Weijun

##
Asta Laiho wrote:

Hi,

I have been using Broad Institute's GSEA tool for gene set enrichment analysis tool in analyzing preranked lists.  This allows me to perform statistical testing between the sample groups without coupling this directly to the enrichment analysis but rather to do these steps in a modular way. This also enables me to sort the genes according to my preferred logic and then analyze gene enrichment in a way that ignores the direction of the differential expression (up/down).  The drawback of the Broad GSEA implementation is that all the annotations used are human based. I have been trying to search for an alternative approach within R/Bioconductor but haven't been able to find one so far that would fully meet the following criterion:

- Allows one to test gene enrichment for preranked gene lists (works with ordered lists of gene symbols/identifiers rather that actual expression value matrixes and thus is not connected to a certain way of gene expression testing between sample groups)
- Is available for a number of organisms and gene set annotations (at least GO and KEGG)
- Allows one to ignore the direction of the regulation and concentrate on generally differentially expressed genes

If someone is aware of a tool that would meet all these criterion, I would be very happy to know. Otherwise this can be regarded as a wish for such a method to be implemented in R/Bioconductor environment.

Greetings,
Asta



More information about the Bioconductor mailing list