Clustering Acoustic Segments Using Multi-Stage Agglomerative Hierarchical Clustering
- PMID: 26517376
- PMCID: PMC4627777
- DOI: 10.1371/journal.pone.0141756
Clustering Acoustic Segments Using Multi-Stage Agglomerative Hierarchical Clustering
Abstract
Agglomerative hierarchical clustering becomes infeasible when applied to large datasets due to its O(N2) storage requirements. We present a multi-stage agglomerative hierarchical clustering (MAHC) approach aimed at large datasets of speech segments. The algorithm is based on an iterative divide-and-conquer strategy. The data is first split into independent subsets, each of which is clustered separately. Thus reduces the storage required for sequential implementations, and allows concurrent computation on parallel computing hardware. The resultant clusters are merged and subsequently re-divided into subsets, which are passed to the following iteration. We show that MAHC can match and even surpass the performance of the exact implementation when applied to datasets of speech segments.
Conflict of interest statement
Figures












References
-
- Jain AK. Data Clustering: 50 years beyond K-means. Pattern Recognition Letters. 2010;31(8):651–666. 10.1016/j.patrec.2009.09.011 - DOI
-
- Jain AK, Dubes RC. Algorithms for Clustering Data. Upper Saddle River, NJ, USA: Prentice-Hall, Inc.; 1988.
-
- Manning CD, Raghavan P. Introduction to Information Retrieval. New York, USA: Cambridge University Press; 2008.
-
- Fung G. A Comprehensive Overview of Basic Clustering Algorithms; 2001.
-
- Jain AK, Murty MN, Flynn PJ. Data Clustering: A Review. ACM Computing Surveys. 1999;31(3):264–323. 10.1145/331499.331504 - DOI
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources