K-ary clustering with optimal leaf ordering for gene expression data
- PMID: 12801867
- DOI: 10.1093/bioinformatics/btg030
K-ary clustering with optimal leaf ordering for gene expression data
Abstract
Motivation: A major challenge in gene expression analysis is effective data organization and visualization. One of the most popular tools for this task is hierarchical clustering. Hierarchical clustering allows a user to view relationships in scales ranging from single genes to large sets of genes, while at the same time providing a global view of the expression data. However, hierarchical clustering is very sensitive to noise, it usually lacks of a method to actually identify distinct clusters, and produces a large number of possible leaf orderings of the hierarchical clustering tree. In this paper we propose a new hierarchical clustering algorithm which reduces susceptibility to noise, permits up to k siblings to be directly related, and provides a single optimal order for the resulting tree.
Results: We present an algorithm that efficiently constructs a k-ary tree, where each node can have up to k children, and then optimally orders the leaves of that tree. By combining k clusters at each step our algorithm becomes more robust against noise and missing values. By optimally ordering the leaves of the resulting tree we maintain the pairwise relationships that appear in the original method, without sacrificing the robustness. Our k-ary construction algorithm runs in O(n(3)) regardless of k and our ordering algorithm runs in O(4(k)n(3)). We present several examples that show that our k-ary clustering algorithm achieves results that are superior to the binary tree results in both global presentation and cluster identification.
Availability: We have implemented the above algorithms in C++ on the Linux operating system.
Similar articles
-
A new algorithm for comparing and visualizing relationships between hierarchical and flat gene expression data clusterings.Bioinformatics. 2005 Nov 1;21(21):3993-9. doi: 10.1093/bioinformatics/bti644. Epub 2005 Sep 1. Bioinformatics. 2005. PMID: 16141251
-
A dynamically growing self-organizing tree (DGSOT) for hierarchical clustering gene expression profiles.Bioinformatics. 2004 Nov 1;20(16):2605-17. doi: 10.1093/bioinformatics/bth292. Epub 2004 May 6. Bioinformatics. 2004. PMID: 15130935
-
Robust multi-scale clustering of large DNA microarray datasets with the consensus algorithm.Bioinformatics. 2006 Jan 1;22(1):58-67. doi: 10.1093/bioinformatics/bti746. Epub 2005 Oct 27. Bioinformatics. 2006. PMID: 16257984
-
Inference from clustering with application to gene-expression microarrays.J Comput Biol. 2002;9(1):105-26. doi: 10.1089/10665270252833217. J Comput Biol. 2002. PMID: 11911797 Review.
-
A ground truth based comparative study on clustering of gene expression data.Front Biosci. 2008 May 1;13:3839-49. doi: 10.2741/2972. Front Biosci. 2008. PMID: 18508478 Free PMC article. Review.
Cited by
-
An improved hypergeometric probability method for identification of functionally linked proteins using phylogenetic profiles.Bioinformation. 2013 Apr 13;9(7):368-74. doi: 10.6026/97320630009368. Print 2013. Bioinformation. 2013. PMID: 23750082 Free PMC article.
-
Biclustering via optimal re-ordering of data matrices in systems biology: rigorous methods and comparative studies.BMC Bioinformatics. 2008 Oct 27;9:458. doi: 10.1186/1471-2105-9-458. BMC Bioinformatics. 2008. PMID: 18954459 Free PMC article.
-
A visualization system for space-time and multivariate patterns (VIS-STAMP).IEEE Trans Vis Comput Graph. 2006 Nov-Dec;12(6):1461-74. doi: 10.1109/TVCG.2006.84. IEEE Trans Vis Comput Graph. 2006. PMID: 17073369 Free PMC article.
-
Gibberellins regulate lateral root formation in Populus through interactions with auxin and other hormones.Plant Cell. 2010 Mar;22(3):623-39. doi: 10.1105/tpc.109.073239. Epub 2010 Mar 30. Plant Cell. 2010. PMID: 20354195 Free PMC article.
-
RNAi library screening reveals Gβ1, Casein Kinase 2 and ICAP-1 as novel regulators of LFA-1-mediated T cell polarity and migration.Immunol Cell Biol. 2025 Jan;103(1):73-92. doi: 10.1111/imcb.12838. Epub 2024 Nov 28. Immunol Cell Biol. 2025. PMID: 39607284 Free PMC article.
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources