Weighted rank aggregation of cluster validation measures: a Monte Carlo cross-entropy approach

Vasyl Pihur¹, Susmita Datta, Somnath Datta

Affiliations

PMID: 17483500
DOI: 10.1093/bioinformatics/btm158

Weighted rank aggregation of cluster validation measures: a Monte Carlo cross-entropy approach

Vasyl Pihur et al. Bioinformatics. 2007.

. 2007 Jul 1;23(13):1607-15.

doi: 10.1093/bioinformatics/btm158. Epub 2007 May 5.

Authors

Vasyl Pihur¹, Susmita Datta, Somnath Datta

Affiliation

¹ Department of Bioinformatics and Biostatistics, University of Louisville, Louisville, KY 40202, USA.

PMID: 17483500
DOI: 10.1093/bioinformatics/btm158

Abstract

Motivation: Biologists often employ clustering techniques in the explorative phase of microarray data analysis to discover relevant biological groupings. Given the availability of numerous clustering algorithms in the machine-learning literature, an user might want to select one that performs the best for his/her data set or application. While various validation measures have been proposed over the years to judge the quality of clusters produced by a given clustering algorithm including their biological relevance, unfortunately, a given clustering algorithm can perform poorly under one validation measure while outperforming many other algorithms under another validation measure. A manual synthesis of results from multiple validation measures is nearly impossible in practice, especially, when a large number of clustering algorithms are to be compared using several measures. An automated and objective way of reconciling the rankings is needed.

Results: Using a Monte Carlo cross-entropy algorithm, we successfully combine the ranks of a set of clustering algorithms under consideration via a weighted aggregation that optimizes a distance criterion. The proposed weighted rank aggregation allows for a far more objective and automated assessment of clustering results than a simple visual inspection. We illustrate our procedure using one simulated as well as three real gene expression data sets from various platforms where we rank a total of eleven clustering algorithms using a combined examination of 10 different validation measures. The aggregate rankings were found for a given number of clusters k and also for an entire range of k.

Availability: R code for all validation measures and rank aggregation is available from the authors upon request.

Supplementary information: Supplementary information are available at http://www.somnathdatta.org/Supp/RankCluster/supp.htm.

PubMed Disclaimer

Publication types

Actions
Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
- Ovid Technologies, Inc.
- Silverchair Information Systems

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Weighted rank aggregation of cluster validation measures: a Monte Carlo cross-entropy approach

Affiliation

Weighted rank aggregation of cluster validation measures: a Monte Carlo cross-entropy approach

Authors

Affiliation

Abstract

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources