Comprehensive analysis of clustering algorithms: exploring limitations and innovative solutions
- PMID: 39314716
- PMCID: PMC11419652
- DOI: 10.7717/peerj-cs.2286
Comprehensive analysis of clustering algorithms: exploring limitations and innovative solutions
Abstract
This survey rigorously explores contemporary clustering algorithms within the machine learning paradigm, focusing on five primary methodologies: centroid-based, hierarchical, density-based, distribution-based, and graph-based clustering. Through the lens of recent innovations such as deep embedded clustering and spectral clustering, we analyze the strengths, limitations, and the breadth of application domains-ranging from bioinformatics to social network analysis. Notably, the survey introduces novel contributions by integrating clustering techniques with dimensionality reduction and proposing advanced ensemble methods to enhance stability and accuracy across varied data structures. This work uniquely synthesizes the latest advancements and offers new perspectives on overcoming traditional challenges like scalability and noise sensitivity, thus providing a comprehensive roadmap for future research and practical applications in data-intensive environments.
Keywords: Centroid-based clustering; Clustering algorithms; Clustering challenges and solutions; Density-based clustering; Distribution-based clustering; Hierarchical clustering; Scalability and efficiency; Unsupervised learning.
© 2024 Wani.
Conflict of interest statement
The authors declare that they have no competing interests.
Figures














Similar articles
-
Research on load clustering algorithm based on variational autoencoder and hierarchical clustering.PLoS One. 2024 Jun 13;19(6):e0303977. doi: 10.1371/journal.pone.0303977. eCollection 2024. PLoS One. 2024. PMID: 38870191 Free PMC article.
-
GMHCC: high-throughput analysis of biomolecular data using graph-based multiple hierarchical consensus clustering.Bioinformatics. 2022 May 26;38(11):3020-3028. doi: 10.1093/bioinformatics/btac290. Bioinformatics. 2022. PMID: 35451457
-
A Comprehensive Survey on Deep Graph Representation Learning.Neural Netw. 2024 May;173:106207. doi: 10.1016/j.neunet.2024.106207. Epub 2024 Feb 27. Neural Netw. 2024. PMID: 38442651 Review.
-
Emergent unsupervised clustering paradigms with potential application to bioinformatics.Front Biosci. 2008 Jan 1;13:677-90. doi: 10.2741/2711. Front Biosci. 2008. PMID: 17981579 Review.
-
Empowering precise advertising with Fed-GANCC: A novel federated learning approach leveraging Generative Adversarial Networks and group clustering.PLoS One. 2024 Apr 10;19(4):e0298261. doi: 10.1371/journal.pone.0298261. eCollection 2024. PLoS One. 2024. PMID: 38598458 Free PMC article.
Cited by
-
Automating the Analysis of Substrate Reactivity through Environment Interaction Mapping.J Chem Inf Model. 2025 Jun 9;65(11):5395-5410. doi: 10.1021/acs.jcim.5c00474. Epub 2025 May 28. J Chem Inf Model. 2025. PMID: 40437800 Free PMC article.
-
Identification of Three Distinct Subgroups in Antiphospholipid Syndrome: Implication for Sex Differences and Prognostic Outcomes from a Multicenter Study.Adv Sci (Weinh). 2025 Apr;12(15):e2415291. doi: 10.1002/advs.202415291. Epub 2025 Feb 18. Adv Sci (Weinh). 2025. PMID: 39965097 Free PMC article.
-
AI Approaches to Homogeneous Catalysis with Transition Metal Complexes.ACS Catal. 2025 May 14;15(11):9089-9105. doi: 10.1021/acscatal.5c01202. eCollection 2025 Jun 6. ACS Catal. 2025. PMID: 40502974 Free PMC article. Review.
-
Unsupervised machine learning analysis of optical coherence tomography radiomics features for predicting treatment outcomes in diabetic macular edema.Sci Rep. 2025 Apr 18;15(1):13389. doi: 10.1038/s41598-025-96988-3. Sci Rep. 2025. PMID: 40251316 Free PMC article.
-
Nanoparticle Skin Penetration: Depths and Routes Modeled In-Silico.Small. 2025 May;21(20):e2412541. doi: 10.1002/smll.202412541. Epub 2025 Mar 27. Small. 2025. PMID: 40150997 Free PMC article.
References
-
- Aggarwal CC, Philip SY, Han J, Wang J. A framework for clustering evolving data streams. Proceedings 2003 VLDB Conference; Amsterdam: Elsevier; 2003. pp. 81–92.
-
- Al-mamory SO, Kamil IS. A new density based sampling to enhance dbscan clustering algorithm. Malaysian Journal of Computer Science. 2019;32(4):315–327. doi: 10.22452/mjcs.vol32no4.5. - DOI
-
- Ankerst M, Breunig MM, Kriegel H-P, Sander J. Optics: ordering points to identify the clustering structure. ACM Sigmod Record. 1999;28(2):49–60. doi: 10.1145/304181.304187. - DOI
-
- Arthur D. K-means++: the advantages if careful seeding. Proceeding Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms; 2007. pp. 1027–1035.
-
- Azen R, Walker CM. Categorical data analysis for the behavioral and social sciences. Milton Park: Routledge; 2021.
Publication types
LinkOut - more resources
Full Text Sources