Human-supervised clustering of multidimensional data using crowdsourcing
- PMID: 35620007
- PMCID: PMC9128850
- DOI: 10.1098/rsos.211189
Human-supervised clustering of multidimensional data using crowdsourcing
Abstract
Clustering is a central task in many data analysis applications. However, there is no universally accepted metric to decide the occurrence of clusters. Ultimately, we have to resort to a consensus between experts. The problem is amplified with high-dimensional datasets where classical distances become uninformative and the ability of humans to fully apprehend the distribution of the data is challenged. In this paper, we design a mobile human-computing game as a tool to query human perception for the multidimensional data clustering problem. We propose two clustering algorithms that partially or entirely rely on aggregated human answers and report the results of two experiments conducted on synthetic and real-world datasets. We show that our methods perform on par or better than the most popular automated clustering algorithms. Our results suggest that hybrid systems leveraging annotations of partial datasets collected through crowdsourcing platforms can be an efficient strategy to capture the collective wisdom for solving abstract computational problems.
Keywords: crowdsourcing; data clustering; games; human-computing.
© 2022 The Authors.
Figures
References
-
- Usama M, Qadir J, Raza A, Arif H, Yau KA, Elkhatib Y, Hussain A, Al-Fuqaha A. 2019. Unsupervised machine learning for networking: techniques, applications and research challenges. IEEE Access 7, 65 579-65 615. (10.1109/ACCESS.2019.2916648) - DOI
-
- Cai Z, Wang J, He K. 2020. Adaptive density-based spatial clustering for massive data analysis. IEEE Access 8, 23 346-23 358. (10.1109/ACCESS.2020.2969440) - DOI
-
- Xu X, Li J, Zhou M, Xu J, Cao J. 2020. Accelerated two-stage particle swarm optimization for clustering not-well-separated data. IEEE Trans. Syst. Man Cybern.: Syst. 50, 4212-4223. (10.1109/TSMC.2018.2839618) - DOI
Associated data
LinkOut - more resources
Full Text Sources