Seeing the forest for the trees: using the Gene Ontology to restructure hierarchical clustering
- PMID: 19497934
- PMCID: PMC2705235
- DOI: 10.1093/bioinformatics/btp327
Seeing the forest for the trees: using the Gene Ontology to restructure hierarchical clustering
Abstract
Motivation: There is a growing interest in improving the cluster analysis of expression data by incorporating into it prior knowledge, such as the Gene Ontology (GO) annotations of genes, in order to improve the biological relevance of the clusters that are subjected to subsequent scrutiny. The structure of the GO is another source of background knowledge that can be exploited through the use of semantic similarity.
Results: We propose here a novel algorithm that integrates semantic similarities (derived from the ontology structure) into the procedure of deriving clusters from the dendrogram constructed during expression-based hierarchical clustering. Our approach can handle the multiple annotations, from different levels of the GO hierarchy, which most genes have. Moreover, it treats annotated and unannotated genes in a uniform manner. Consequently, the clusters obtained by our algorithm are characterized by significantly enriched annotations. In both cross-validation tests and when using an external index such as protein-protein interactions, our algorithm performs better than previous approaches. When applied to human cancer expression data, our algorithm identifies, among others, clusters of genes related to immune response and glucose metabolism. These clusters are also supported by protein-protein interaction data.
Figures


Similar articles
-
GO functional similarity clustering depends on similarity measure, clustering method, and annotation completeness.BMC Bioinformatics. 2019 Mar 27;20(1):155. doi: 10.1186/s12859-019-2752-2. BMC Bioinformatics. 2019. PMID: 30917779 Free PMC article.
-
Knowledge-assisted recognition of cluster boundaries in gene expression data.Artif Intell Med. 2005 Sep-Oct;35(1-2):171-83. doi: 10.1016/j.artmed.2005.02.007. Artif Intell Med. 2005. PMID: 16054350
-
Information theory applied to the sparse gene ontology annotation network to predict novel gene function.Bioinformatics. 2007 Jul 1;23(13):i529-38. doi: 10.1093/bioinformatics/btm195. Bioinformatics. 2007. PMID: 17646340 Free PMC article.
-
How to decide which are the most pertinent overly-represented features during gene set enrichment analysis.BMC Bioinformatics. 2007 Sep 11;8:332. doi: 10.1186/1471-2105-8-332. BMC Bioinformatics. 2007. PMID: 17848190 Free PMC article.
-
Comparing algorithms for clustering of expression data: how to assess gene clusters.Methods Mol Biol. 2009;541:479-509. doi: 10.1007/978-1-59745-243-4_21. Methods Mol Biol. 2009. PMID: 19381534 Review.
Cited by
-
Semi-supervised consensus clustering for gene expression data analysis.BioData Min. 2014 May 8;7:7. doi: 10.1186/1756-0381-7-7. eCollection 2014. BioData Min. 2014. PMID: 24920961 Free PMC article.
-
Metric Labeling and Semimetric Embedding for Protein Annotation Prediction.J Comput Biol. 2021 May;28(5):514-525. doi: 10.1089/cmb.2020.0425. Epub 2020 Dec 23. J Comput Biol. 2021. PMID: 33370163 Free PMC article.
-
Improving clustering with metabolic pathway data.BMC Bioinformatics. 2014 Apr 10;15:101. doi: 10.1186/1471-2105-15-101. BMC Bioinformatics. 2014. PMID: 24717120 Free PMC article.
-
Clustering of High Throughput Gene Expression Data.Comput Oper Res. 2012 Dec;39(12):3046-3061. doi: 10.1016/j.cor.2012.03.008. Comput Oper Res. 2012. PMID: 23144527 Free PMC article.
References
-
- Bellazzi R, Zupan B. Towards knowledge-based gene expression data mining. J. Biomed. Inform. 2007;6:787–802. - PubMed
-
- Buehler EC, et al. The CRASSS plug-in for integrating annotation data with hierarchical clustering results. Bioinformatics. 2004;20:3266–3269. - PubMed
-
- Cheng J, et al. A knowledge-based clustering algorithm driven by Gene Ontology. J. Biopharm. Stat. 2004;14:687–700. - PubMed
-
- Crocker PR, et al. Siglecs and their roles in the immune system. Nat. Rev. Immunol. 2007;7:255–266. - PubMed
-
- Curtis RK, et al. Pathways to the analysis of microarray data. Trends Biotechnol. 2005;23:429–435. - PubMed