Gene Ontology analysis in multiple gene clusters under multiple hypothesis testing framework
- PMID: 17913480
- DOI: 10.1016/j.artmed.2007.08.002
Gene Ontology analysis in multiple gene clusters under multiple hypothesis testing framework
Abstract
Objective: Gene Ontology (GO) has become a routine resource for functional analysis of gene lists. Although a number of tools have been provided to identify enriched GO terms in one or two gene lists, two technical challenges remain. First, how to handle multiple hypothesis testing in the analysis given that the tests are heavily correlated; second, how to identify GO terms that are enriched in a gene cluster, as compared to multiple other gene clusters. We provide a statistical procedure to rigorously treat these problems and offer a software tool for applying GO to the analysis of gene clusters.
Methods: We previously introduced a statistical procedure that handles hypothesis testing in a two-group comparison scenario. In this paper we extend the two-group comparison procedure into a general procedure that enables the analysis of any number of gene lists/clusters. This new procedure enables identification of GO terms enriched in any gene cluster, while it controls for multiple hypothesis testing. This procedure is implemented into a user-friendly analysis tool: GoSurfer. The current version of GoSurfer takes one or several gene lists as input, and it identifies the GO terms that are enriched in any of the input gene lists. GoSurfer estimates a conservative false discovery rate (FDR) for every GO term. The FDR estimation procedure in GoSurfer has two advantages: it does not rely on independence assumption, and it does not assume all the hypotheses are null hypothesis (complete null). Thus GoSurfer's FDR estimates are mildly conservative rather than overly conservative.
Results: We implemented the new procedure for GO analysis in multiple gene clusters into the GoSurfer software. We provide three examples on using GoSurfer to analyze time course gene expression data sets on the differentiation of embryonic stem cells. In the example of analysis of multiple gene clusters, we first used a typical clustering algorithm and identified five gene clusters, representing up-regulation, down-regulation and other patterns in the differentiation time course. Taking all the five gene clusters as input data, GoSurfer reports "cell adhesion" and "muscle contraction" as significant GO terms for the up-regulated cluster, "amino acids metabolism" as a significant GO term for the down-regulated gene cluster, and GoSurfer reports a number of GO terms related to RNA processing and RNA transport as significant terms to a cluster that is up-regulated in both early and late time points. This may suggest that genes for RNA processing and genes for RNA transport are coregulated in the differentiation process of embryonic stem cells.
Conclusion: The GoSurfer software is provided to analyze multiple gene clusters and identify GO terms that are enriched in any gene cluster. Gosurfer is available at: www.gosurfer.org.
Similar articles
-
GeneTools--application for functional annotation and statistical hypothesis testing.BMC Bioinformatics. 2006 Oct 24;7:470. doi: 10.1186/1471-2105-7-470. BMC Bioinformatics. 2006. PMID: 17062145 Free PMC article.
-
How to decide which are the most pertinent overly-represented features during gene set enrichment analysis.BMC Bioinformatics. 2007 Sep 11;8:332. doi: 10.1186/1471-2105-8-332. BMC Bioinformatics. 2007. PMID: 17848190 Free PMC article.
-
High-Throughput GoMiner, an 'industrial-strength' integrative gene ontology tool for interpretation of multiple-microarray experiments, with application to studies of Common Variable Immune Deficiency (CVID).BMC Bioinformatics. 2005 Jul 5;6:168. doi: 10.1186/1471-2105-6-168. BMC Bioinformatics. 2005. PMID: 15998470 Free PMC article.
-
Interpreting experimental results using gene ontologies.Methods Enzymol. 2006;411:340-52. doi: 10.1016/S0076-6879(06)11018-6. Methods Enzymol. 2006. PMID: 16939799 Review.
-
The use of network analyses for elucidating mechanisms in cardiovascular disease.Mol Biosyst. 2010 Feb;6(2):289-304. doi: 10.1039/b912078e. Epub 2009 Oct 16. Mol Biosyst. 2010. PMID: 20094647 Review.
Cited by
-
Identifying significant genes and functionally enriched pathways in familial hypercholesterolemia using integrated gene co-expression network analysis.Saudi J Biol Sci. 2022 May;29(5):3287-3299. doi: 10.1016/j.sjbs.2022.02.002. Epub 2022 Feb 9. Saudi J Biol Sci. 2022. PMID: 35844366 Free PMC article.
-
Protein-protein interaction network of celiac disease.Gastroenterol Hepatol Bed Bench. 2016 Fall;9(4):268-277. Gastroenterol Hepatol Bed Bench. 2016. PMID: 27895852 Free PMC article.
-
Comparison of Epidermal Gene Expression Profiles in Mice Aged 1 to 20 Months.Clin Cosmet Investig Dermatol. 2022 Jan 16;15:69-76. doi: 10.2147/CCID.S346416. eCollection 2022. Clin Cosmet Investig Dermatol. 2022. PMID: 35079219 Free PMC article.
-
Integrative weighted molecular network construction from transcriptomics and genome wide association data to identify shared genetic biomarkers for COPD and lung cancer.PLoS One. 2022 Oct 4;17(10):e0274629. doi: 10.1371/journal.pone.0274629. eCollection 2022. PLoS One. 2022. PMID: 36194576 Free PMC article.
-
CircRNA Samd4 induces cardiac repair after myocardial infarction by blocking mitochondria-derived ROS output.Mol Ther. 2022 Nov 2;30(11):3477-3498. doi: 10.1016/j.ymthe.2022.06.016. Epub 2022 Jul 4. Mol Ther. 2022. PMID: 35791879 Free PMC article.
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources