This is a preprint.
Hidden Flaws Behind Expert-Level Accuracy of Multimodal GPT-4 Vision in Medicine
- PMID: 38410646
- PMCID: PMC10896362
Hidden Flaws Behind Expert-Level Accuracy of Multimodal GPT-4 Vision in Medicine
Update in
-
Hidden flaws behind expert-level accuracy of multimodal GPT-4 vision in medicine.NPJ Digit Med. 2024 Jul 23;7(1):190. doi: 10.1038/s41746-024-01185-7. NPJ Digit Med. 2024. PMID: 39043988 Free PMC article.
Abstract
Recent studies indicate that Generative Pre-trained Transformer 4 with Vision (GPT-4V) outperforms human physicians in medical challenge tasks. However, these evaluations primarily focused on the accuracy of multi-choice questions alone. Our study extends the current scope by conducting a comprehensive analysis of GPT-4V's rationales of image comprehension, recall of medical knowledge, and step-by-step multimodal reasoning when solving New England Journal of Medicine (NEJM) Image Challenges - an imaging quiz designed to test the knowledge and diagnostic capabilities of medical professionals. Evaluation results confirmed that GPT-4V performs comparatively to human physicians regarding multi-choice accuracy (81.6% vs. 77.8%). GPT-4V also performs well in cases where physicians incorrectly answer, with over 78% accuracy. However, we discovered that GPT-4V frequently presents flawed rationales in cases where it makes the correct final choices (35.5%), most prominent in image comprehension (27.2%). Regardless of GPT-4V's high accuracy in multi-choice questions, our findings emphasize the necessity for further in-depth evaluations of its rationales before integrating such multimodal AI models into clinical workflows.
Conflict of interest statement
Competing Interests The Authors declare no Competing Non-Financial Interests but the following Competing Financial Interests: R.S. receives royalties for patents or software licenses from iCAD, Philips, ScanMed, PingAn, Translation Holdings, and MGB. R.S. received research support from PingAn.
References
-
- OpenAI. GPT-4 Technical Report. Preprint at 10.48550/arXiv.2303.08774 (2023). - DOI
Publication types
Grants and funding
LinkOut - more resources
Full Text Sources