Weighted rank aggregation of cluster validation measures: a Monte Carlo cross-entropy approach
- PMID: 17483500
- DOI: 10.1093/bioinformatics/btm158
Weighted rank aggregation of cluster validation measures: a Monte Carlo cross-entropy approach
Abstract
Motivation: Biologists often employ clustering techniques in the explorative phase of microarray data analysis to discover relevant biological groupings. Given the availability of numerous clustering algorithms in the machine-learning literature, an user might want to select one that performs the best for his/her data set or application. While various validation measures have been proposed over the years to judge the quality of clusters produced by a given clustering algorithm including their biological relevance, unfortunately, a given clustering algorithm can perform poorly under one validation measure while outperforming many other algorithms under another validation measure. A manual synthesis of results from multiple validation measures is nearly impossible in practice, especially, when a large number of clustering algorithms are to be compared using several measures. An automated and objective way of reconciling the rankings is needed.
Results: Using a Monte Carlo cross-entropy algorithm, we successfully combine the ranks of a set of clustering algorithms under consideration via a weighted aggregation that optimizes a distance criterion. The proposed weighted rank aggregation allows for a far more objective and automated assessment of clustering results than a simple visual inspection. We illustrate our procedure using one simulated as well as three real gene expression data sets from various platforms where we rank a total of eleven clustering algorithms using a combined examination of 10 different validation measures. The aggregate rankings were found for a given number of clusters k and also for an entire range of k.
Availability: R code for all validation measures and rank aggregation is available from the authors upon request.
Supplementary information: Supplementary information are available at http://www.somnathdatta.org/Supp/RankCluster/supp.htm.
Similar articles
-
Detecting clusters of different geometrical shapes in microarray gene expression data.Bioinformatics. 2005 May 1;21(9):1927-34. doi: 10.1093/bioinformatics/bti251. Epub 2005 Jan 12. Bioinformatics. 2005. PMID: 15647300
-
Graph-based consensus clustering for class discovery from gene expression data.Bioinformatics. 2007 Nov 1;23(21):2888-96. doi: 10.1093/bioinformatics/btm463. Epub 2007 Sep 14. Bioinformatics. 2007. PMID: 17872912
-
A new algorithm for comparing and visualizing relationships between hierarchical and flat gene expression data clusterings.Bioinformatics. 2005 Nov 1;21(21):3993-9. doi: 10.1093/bioinformatics/bti644. Epub 2005 Sep 1. Bioinformatics. 2005. PMID: 16141251
-
Classification based upon gene expression data: bias and precision of error rates.Bioinformatics. 2007 Jun 1;23(11):1363-70. doi: 10.1093/bioinformatics/btm117. Epub 2007 Mar 28. Bioinformatics. 2007. PMID: 17392326 Review.
-
How does gene expression clustering work?Nat Biotechnol. 2005 Dec;23(12):1499-501. doi: 10.1038/nbt1205-1499. Nat Biotechnol. 2005. PMID: 16333293 Review.
Cited by
-
Predicting clinical outcomes in neuroblastoma with genomic data integration.Biol Direct. 2018 Sep 27;13(1):20. doi: 10.1186/s13062-018-0223-8. Biol Direct. 2018. PMID: 30621745 Free PMC article.
-
Selection and validation of reference genes for quantitative real-time PCR of Quercus mongolica Fisch. ex Ledeb under abiotic stresses.PLoS One. 2022 Apr 28;17(4):e0267126. doi: 10.1371/journal.pone.0267126. eCollection 2022. PLoS One. 2022. PMID: 35482686 Free PMC article.
-
Identification of endogenous normalizing genes for expression studies in inguinal ring tissue for scrotal hernias in pigs.PLoS One. 2018 Sep 20;13(9):e0204348. doi: 10.1371/journal.pone.0204348. eCollection 2018. PLoS One. 2018. PMID: 30235332 Free PMC article.
-
Cluster categorization of urban roads to optimize their noise monitoring.Environ Monit Assess. 2016 Jan;188(1):26. doi: 10.1007/s10661-015-4994-4. Epub 2015 Dec 12. Environ Monit Assess. 2016. PMID: 26661962 Free PMC article.
-
Determination of Temporal Order among the Components of an Oscillatory System.PLoS One. 2015 Jul 7;10(7):e0124842. doi: 10.1371/journal.pone.0124842. eCollection 2015. PLoS One. 2015. PMID: 26151635 Free PMC article.
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources