Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2009 Feb 3:10:47.
doi: 10.1186/1471-2105-10-47.

A general modular framework for gene set enrichment analysis

Affiliations

A general modular framework for gene set enrichment analysis

Marit Ackermann et al. BMC Bioinformatics. .

Abstract

Background: Analysis of microarray and other high-throughput data on the basis of gene sets, rather than individual genes, is becoming more important in genomic studies. Correspondingly, a large number of statistical approaches for detecting gene set enrichment have been proposed, but both the interrelations and the relative performance of the various methods are still very much unclear.

Results: We conduct an extensive survey of statistical approaches for gene set analysis and identify a common modular structure underlying most published methods. Based on this finding we propose a general framework for detecting gene set enrichment. This framework provides a meta-theory of gene set analysis that not only helps to gain a better understanding of the relative merits of each embedded approach but also facilitates a principled comparison and offers insights into the relative interplay of the methods.

Conclusion: We use this framework to conduct a computer simulation comparing 261 different variants of gene set enrichment procedures and to analyze two experimental data sets. Based on the results we offer recommendations for best practices regarding the choice of effective procedures for gene set enrichment analysis.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Schematic overview of the modular structure underlying procedures for gene set enrichment analysis.
Figure 2
Figure 2
Analysis of p53 data set. Left: Number of significant gene sets in dependence of p-value cutoff and choice of gene set statistic. On the gene level, the squared moderated t-statistic was employed. Right: Bar plot for p-value cutoff 0.01.
Figure 3
Figure 3
Distribution of correlation across the 290 gene sets investigated for the p53 data. Top: histogram of averaged pairwise correlations. Bottom: histogram of averaged absolute values.
Figure 4
Figure 4
Analysis of Hedenfalk data set. Left: Number of significant gene sets in dependence of p-value cutoff and choice of gene set statistic. On the gene level, the squared moderated t-statistic was employed. Right: Bar plot for p-value cutoff 0.01.

Similar articles

Cited by

References

    1. Draghici S, Khatri P, Martins RP, Ostermeier GC, Krawetz SA. Global functional profiling of gene expression. Genomics. 2003;81:98–104. - PubMed
    1. Hosack DA, Dennis G, Sherman BT, Lane HC, Lempicki RA. Identifying biological themes within lists of genes with EASE. Genome Biol. 2003;4:R70. - PMC - PubMed
    1. Zhang B, Schmoyer D, Kirov S, Snoddy J. GOTree Machine (GOTM): a web-based platform for interpreting sets of interesting genes using gene ontology hierarchies. BMC Bioinformatics. 2004;5:16. - PMC - PubMed
    1. Khatri P, Draghici S. Ontological analysis of gene expression data: current tools, limitations, and open problems. Bioinformatics. 2005;21:3587–3595. - PMC - PubMed
    1. Goeman JJ, Bühlmann P. Analyzing gene expression data in terms of gene sets: methodological issues. Bioinformatics. 2007;23:980–987. - PubMed

LinkOut - more resources