Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Jun;38(11):3523-32.
doi: 10.1093/nar/gkq045. Epub 2010 Feb 19.

GOing Bayesian: model-based gene set analysis of genome-scale data

Affiliations

GOing Bayesian: model-based gene set analysis of genome-scale data

Sebastian Bauer et al. Nucleic Acids Res. 2010 Jun.

Abstract

The interpretation of data-driven experiments in genomics often involves a search for biological categories that are enriched for the responder genes identified by the experiments. However, knowledge bases such as the Gene Ontology (GO) contain hundreds or thousands of categories with very high overlap between categories. Thus, enrichment analysis performed on one category at a time frequently returns large numbers of correlated categories, leaving the choice of the most relevant ones to the user's; interpretation. Here we present model-based gene set analysis (MGSA) that analyzes all categories at once by embedding them in a Bayesian network, in which gene response is modeled as a function of the activation of biological categories. Probabilistic inference is used to identify the active categories. The Bayesian modeling approach naturally takes category overlap into account and avoids the need for multiple testing corrections met in single-category enrichment analysis. On simulated data, MGSA identifies active categories with up to 95% precision at a recall of 20% for moderate settings of noise, leading to a 10-fold precision improvement over single-category statistical enrichment analysis. Application to a gene expression data set in yeast demonstrates that the method provides high-level, summarized views of core biological processes and correctly eliminates confounding associations.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
A Bayesian network to model gene response with gene categories. Gene categories, or terms (formula image, ellipses) can be either on or off. Terms that are on activate the hidden state (formula image, rectangles) of all genes annotated to them, the other genes remain off. The observed states (formula image, diamonds) of the genes are noisy observations of their true hidden state. The parameters of the model (light gray nodes) are the prior probability of each term to be active, formula image, the false positive rate, formula image and the false negative rate, formula image.
Figure 2.
Figure 2.
Benchmarking on simulated data set. Performance of the TfT, PCU, TopW, GenGOformula image, MGSAformula image and MGSA algorithms on simulated data set with different settings of false positive (formula image) and false negative (formula image) rates. In each row, the leftmost panel shows the precision for a recall of 0.2 (A, B), the middle panel precision as a function of recall (C, D) and the rightmost panel the ROC curve (E, F).
Figure 3.
Figure 3.
Application on a respiratory versus fermentative growth expression dataset in yeast. (A) Ranked list of the 192 overrepresented terms using a term-for-term Fisher's; test with Benjamini–Hochberg correction for multiple testing. Many of the top terms are redundant and relate to similar functions. The term cell death (highlighted in blue) is a spurious association (see text). (B) Ranked list of the top 10 terms identified by a single run of MGSA (six of them with a posterior >0.5 in green). (C) Error bars (95% confidence intervals) obtained with 20 runs of MGSA. Each of the seven terms was identified with a posterior >0.5 in at least one of the 20 runs.

References

    1. The Gene Ontology Consortium. Gene Ontology: tool for the unification of biology. Nat. Genet. 2000;25:25–29. - PMC - PubMed
    1. Kanehisa M, Goto S. KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res. 2000;28:27–30. - PMC - PubMed
    1. Rhee SY, Wood V, Dolinski K, Draghici S. Use and misuse of the Gene Ontology annotations. Nat. Rev. Genet. 2008;9:509–515. - PubMed
    1. Jiang Z, Gentleman R. Extensions to gene set enrichment. Bioinformatics. 2007;23:306–313. - PubMed
    1. Mootha VK, Lindgren CM, Eriksson K.-F, Subramanian A, Sihag S, Lehar J, Puigserver P, Carlsson E, Ridderstrle M, Laurila E, et al. PGC-1α-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nat. Genet. 2003;34:267–273. - PubMed

Publication types