Boosting wisdom of the crowd for medical image annotation using training performance and task features
- PMID: 38763994
- PMCID: PMC11102897
- DOI: 10.1186/s41235-024-00558-6
Boosting wisdom of the crowd for medical image annotation using training performance and task features
Abstract
A crucial bottleneck in medical artificial intelligence (AI) is high-quality labeled medical datasets. In this paper, we test a large variety of wisdom of the crowd algorithms to label medical images that were initially classified by individuals recruited through an app-based platform. Individuals classified skin lesions from the International Skin Lesion Challenge 2018 into 7 different categories. There was a large dispersion in the geographical location, experience, training, and performance of the recruited individuals. We tested several wisdom of the crowd algorithms of varying complexity from a simple unweighted average to more complex Bayesian models that account for individual patterns of errors. Using a switchboard analysis, we observe that the best-performing algorithms rely on selecting top performers, weighting decisions by training accuracy, and take into account the task environment. These algorithms far exceed expert performance. We conclude by discussing the implications of these approaches for the development of medical AI.
© 2024. The Author(s).
Conflict of interest statement
Erik Duhaime is the CEO a stakeholder in Centaur Labs. Eeshan Hasan and Jennifer Trueblood do not hold any stakes in Centaur Labs and have no competing interests.
Figures






Similar articles
-
Use of Crowd Innovation to Develop an Artificial Intelligence-Based Solution for Radiation Therapy Targeting.JAMA Oncol. 2019 May 1;5(5):654-661. doi: 10.1001/jamaoncol.2019.0159. JAMA Oncol. 2019. PMID: 30998808 Free PMC article.
-
Crowdsourcing Skin Demarcations of Chronic Graft-Versus-Host Disease in Patient Photographs: Training Versus Performance Study.JMIR Dermatol. 2023 Dec 26;6:e48589. doi: 10.2196/48589. JMIR Dermatol. 2023. PMID: 38147369 Free PMC article.
-
Agreement Between Experts and an Untrained Crowd for Identifying Dermoscopic Features Using a Gamified App: Reader Feasibility Study.JMIR Med Inform. 2023 Jan 18;11:e38412. doi: 10.2196/38412. JMIR Med Inform. 2023. PMID: 36652282 Free PMC article.
-
Mapping of Crowdsourcing in Health: Systematic Review.J Med Internet Res. 2018 May 15;20(5):e187. doi: 10.2196/jmir.9330. J Med Internet Res. 2018. PMID: 29764795 Free PMC article.
-
Artificial intelligence-based image classification methods for diagnosis of skin cancer: Challenges and opportunities.Comput Biol Med. 2020 Dec;127:104065. doi: 10.1016/j.compbiomed.2020.104065. Epub 2020 Oct 27. Comput Biol Med. 2020. PMID: 33246265 Free PMC article. Review.
Cited by
-
Human-AI collectives most accurately diagnose clinical vignettes.Proc Natl Acad Sci U S A. 2025 Jun 17;122(24):e2426153122. doi: 10.1073/pnas.2426153122. Epub 2025 Jun 13. Proc Natl Acad Sci U S A. 2025. PMID: 40512795 Free PMC article.
References
-
- Afflerbach P, van Dun C, Gimpel H, Parak D, Seyfried J. A simulation-based approach to understanding the wisdom of crowds phenomenon in aggregating expert judgment. Business & Information Systems Engineering. 2021;63:329–348. doi: 10.1007/s12599-020-00664-x. - DOI
-
- Armstrong, J-S. (2001). Combining forecasts. Principles of forecasting: a handbook for researchers and practitioners, J. Scott Armstrong, ed., Norwell, MA: Kluwer Academic Publishers.
-
- Atanasov, P. & Himmelstein, M. (2023). Talent spotting in crowd prediction. In Judgment in predictive analytics (135–184). Springer.
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources