On the Accuracy and Parallelism of GPGPU-Powered Incremental Clustering Algorithms
- PMID: 29123546
- PMCID: PMC5662818
- DOI: 10.1155/2017/2519782
On the Accuracy and Parallelism of GPGPU-Powered Incremental Clustering Algorithms
Abstract
Incremental clustering algorithms play a vital role in various applications such as massive data analysis and real-time data processing. Typical application scenarios of incremental clustering raise high demand on computing power of the hardware platform. Parallel computing is a common solution to meet this demand. Moreover, General Purpose Graphic Processing Unit (GPGPU) is a promising parallel computing device. Nevertheless, the incremental clustering algorithm is facing a dilemma between clustering accuracy and parallelism when they are powered by GPGPU. We formally analyzed the cause of this dilemma. First, we formalized concepts relevant to incremental clustering like evolving granularity. Second, we formally proved two theorems. The first theorem proves the relation between clustering accuracy and evolving granularity. Additionally, this theorem analyzes the upper and lower bounds of different-to-same mis-affiliation. Fewer occurrences of such mis-affiliation mean higher accuracy. The second theorem reveals the relation between parallelism and evolving granularity. Smaller work-depth means superior parallelism. Through the proofs, we conclude that accuracy of an incremental clustering algorithm is negatively related to evolving granularity while parallelism is positively related to the granularity. Thus the contradictory relations cause the dilemma. Finally, we validated the relations through a demo algorithm. Experiment results verified theoretical conclusions.
Figures






Similar articles
-
Fiber Clustering Acceleration With a Modified Kmeans++ Algorithm Using Data Parallelism.Front Neuroinform. 2021 Sep 1;15:727859. doi: 10.3389/fninf.2021.727859. eCollection 2021. Front Neuroinform. 2021. PMID: 34539370 Free PMC article.
-
Topical perspective on massive threading and parallelism.J Mol Graph Model. 2011 Sep;30:82-9. doi: 10.1016/j.jmgm.2011.06.007. Epub 2011 Jun 29. J Mol Graph Model. 2011. PMID: 21764615 Review.
-
Fast polyenergetic forward projection for image formation using OpenCL on a heterogeneous parallel computing platform.Med Phys. 2012 Nov;39(11):6745-56. doi: 10.1118/1.4758062. Med Phys. 2012. PMID: 23127068
-
Survey on granularity clustering.Cogn Neurodyn. 2015 Dec;9(6):561-72. doi: 10.1007/s11571-015-9351-3. Epub 2015 Jul 29. Cogn Neurodyn. 2015. PMID: 26557926 Free PMC article. Review.
-
CLUE: A Fast Parallel Clustering Algorithm for High Granularity Calorimeters in High-Energy Physics.Front Big Data. 2020 Nov 27;3:591315. doi: 10.3389/fdata.2020.591315. eCollection 2020. Front Big Data. 2020. PMID: 33937749 Free PMC article.
References
-
- Wang P., Zhang P., Zhou C., Li Z., Yang H. Hierarchical evolving Dirichlet processes for modeling nonlinear evolutionary traces in temporal data. Data Mining and Knowledge Discovery. 2017;31(1):32–64. doi: 10.1007/s10618-016-0454-1. - DOI
-
- Ramírez-Gallego S., Krawczyk B., García S., Woźniak M., Herrera F. A survey on data preprocessing for data stream mining: current status and future directions. Neurocomputing. 2017;239:39–57. doi: 10.1016/j.neucom.2017.01.078. - DOI
-
- García S., Luengo J., Herrera F. Data Preprocessing in Data Mining. Springer; 2015. - DOI
-
- Ordoñez A., Ordoñez H., Corrales J. C., Cobos C., Wives L. K., Thom L. H. Grouping of business processes models based on an incremental clustering algorithm using fuzzy similarity and multimodal search. Expert Systems with Applications. 2017;67:163–177. doi: 10.1016/j.eswa.2016.08.061. - DOI
-
- Chen C., Mu D., Zhang H., Hong B. A GPU-accelerated approximate algorithm for incremental learning of Gaussian mixture model. Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops (IPDPSW '12); May 2012; Shanghai, China. pp. 1937–1943. - DOI
LinkOut - more resources
Full Text Sources
Other Literature Sources