Saturday, October 22, 2011

Gene set enrichment analysis made simple (GSEA) MADE SIMPLE

GENE SET ENRICHMENT ANALYSIS MADE SIMPLE

Among the many applications of microarray technology, one of the most popular is the identification of genes that are differentially expressed in two conditions. A common statistical approach is to quantify the interest of each gene with a p-value, adjust these p-values for multiple comparisons, chose an appropriate cut-off, and create a list of candidate genes. This approach has been criticized for ignoring biological knowledge regarding how genes work together. Recently a series of methods, that do incorporate biological knowledge, have been proposed. However, many of these methods seem overly complicated. Furthermore, the most popular method, Gene Set Enrichment Analysis (GSEA), is based on a statistical test known for its lack of sensitivity. In this paper we compare the performance of a simple alternative to GSEA.We find that this simple solution clearly outperforms GSEA.We demonstrate this with eight different microarray datasets.

There are currently two major types of procedure for incorporating biological knowledge into
differential expression analysis. We will refer to these as the over-representation and the aggregate
score approaches.

Over-representation analysis can be summarized as follows: First, form a list of candidate
genes using the marginal approach. Then, for each gene set, we create a two-by-two table compar-
ing the number of candidate genes that are members of the category to those that are not members.
The significance of over-representation can be assessed, for example, using the hypergeometric
distribution or its binomial approximation.
A limitation of the over-representation approach is that it ignores all the genes that did not
make the list of candidate genes.

The aggregate score approach (eg. GSEA), does not have this limitation. The basic idea
is to assign scores to each gene set based on all the gene-specific scores for that gene set.

In this paper we compare GSEA to the one sample z-test and χ2 -test

http://www.bepress.com/jhubiostat/paper185/

that 7 or so genes is
sufficient to uniquely determine a gene set, -- Jesse

No comments: