Evaluating multimodal AI in medical diagnostics
- PMID: 39112822
- PMCID: PMC11306783
- DOI: 10.1038/s41746-024-01208-3
Evaluating multimodal AI in medical diagnostics
Abstract
This study evaluates multimodal AI models' accuracy and responsiveness in answering NEJM Image Challenge questions, juxtaposed with human collective intelligence, underscoring AI's potential and current limitations in clinical diagnostics. Anthropic's Claude 3 family demonstrated the highest accuracy among the evaluated AI models, surpassing the average human accuracy, while collective human decision-making outperformed all AI models. GPT-4 Vision Preview exhibited selectivity, responding more to easier questions with smaller images and longer questions.
© 2024. The Author(s).
Conflict of interest statement
The authors declare no competing interests.
Figures


Similar articles
-
Comparing the Performance of Popular Large Language Models on the National Board of Medical Examiners Sample Questions.Cureus. 2024 Mar 11;16(3):e55991. doi: 10.7759/cureus.55991. eCollection 2024 Mar. Cureus. 2024. PMID: 38606229 Free PMC article.
-
Expert of Experts Verification and Alignment (EVAL) Framework for Large Language Models Safety in Gastroenterology.NPJ Digit Med. 2025 May 3;8(1):242. doi: 10.1038/s41746-025-01589-z. NPJ Digit Med. 2025. PMID: 40319106 Free PMC article.
-
Feasibility of Multimodal Artificial Intelligence Using GPT-4 Vision for the Classification of Middle Ear Disease: Qualitative Study and Validation.JMIR AI. 2024 May 31;3:e58342. doi: 10.2196/58342. JMIR AI. 2024. PMID: 38875669 Free PMC article.
-
The Role of AI in Hospitals and Clinics: Transforming Healthcare in the 21st Century.Bioengineering (Basel). 2024 Mar 29;11(4):337. doi: 10.3390/bioengineering11040337. Bioengineering (Basel). 2024. PMID: 38671759 Free PMC article. Review.
-
AI-Driven Models for Diagnosing and Predicting Outcomes in Lung Cancer: A Systematic Review and Meta-Analysis.Cancers (Basel). 2024 Feb 5;16(3):674. doi: 10.3390/cancers16030674. Cancers (Basel). 2024. PMID: 38339425 Free PMC article. Review.
Cited by
-
Ethical and legal considerations in healthcare AI: innovation and policy for safe and fair use.R Soc Open Sci. 2025 May 14;12(5):241873. doi: 10.1098/rsos.241873. eCollection 2025 May. R Soc Open Sci. 2025. PMID: 40370601 Free PMC article. Review.
-
Large language model-based multimodal system for detecting and grading ocular surface diseases from smartphone images.Front Cell Dev Biol. 2025 May 23;13:1600202. doi: 10.3389/fcell.2025.1600202. eCollection 2025. Front Cell Dev Biol. 2025. PMID: 40486905 Free PMC article.
-
Large Language Models in Medical Diagnostics: Scoping Review With Bibliometric Analysis.J Med Internet Res. 2025 Jun 9;27:e72062. doi: 10.2196/72062. J Med Internet Res. 2025. PMID: 40489764 Free PMC article.
-
Prompt injection attacks on vision language models in oncology.Nat Commun. 2025 Feb 1;16(1):1239. doi: 10.1038/s41467-024-55631-x. Nat Commun. 2025. PMID: 39890777 Free PMC article.
-
AI-assisted multi-modal information for the screening of depression: a systematic review and meta-analysis.NPJ Digit Med. 2025 Aug 16;8(1):523. doi: 10.1038/s41746-025-01933-3. NPJ Digit Med. 2025. PMID: 40819119 Free PMC article.
References
-
- Eriksen, A. V., Möller, S. & Ryg, J. Use of GPT-4 to Diagnose Complex Clinical Cases. NEJM AI1, AIp2300031 (2023).10.1056/AIp2300031 - DOI
-
- Wu, C. et al. Can GPT-4V(ision) Serve Medical Applications? Case Studies on GPT-4V for Multimodal Medical Diagnosis. Preprint at http://arxiv.org/abs/2310.09909 (2023).
-
- Brin, D. et al. Assessing GPT-4 Multimodal Performance in Radiological Image Analysis. 2023.11.15.23298583 Preprint at 10.1101/2023.11.15.23298583 (2023).
LinkOut - more resources
Full Text Sources