Mixture Model Tests Of Hierarchical Clustering Algorithms: The Problem Of Classifying Everybody
- PMID: 26821856
- DOI: 10.1207/s15327906mbr1403_6
Mixture Model Tests Of Hierarchical Clustering Algorithms: The Problem Of Classifying Everybody
Abstract
Due to the effects of outliers, mixture model tests that require all objects to be classified can severely underestimate the accuracy of hierarchical clustering algorithms. More valid and relevant comparisons between algorithms can be made by calculating accuracy at several levels in the hierarchical tree and considering accuracy as a function of the coverage of the classification. Using this procedure, several algorithms were compared on their ability to resolve ten multivariate normal mixtures. All of the algorithms were significantly more accurate than a random linkage algorithm, and accuracy was inversely related to coverage. Algorithms using correlation as the similarity measure were significantly more accurate than those using Euclidean distance (p < .001). A subset of high accuracy algorithms, including single, average, and centroid linkage using correlation, and Ward's minimum variance technique, was identified.
LinkOut - more resources
Full Text Sources