Measuring the Impact of AI in the Diagnosis of Hospitalized Patients: A Randomized Clinical Vignette Survey Study
- PMID: 38112814
- PMCID: PMC10731487
- DOI: 10.1001/jama.2023.22295
Measuring the Impact of AI in the Diagnosis of Hospitalized Patients: A Randomized Clinical Vignette Survey Study
Abstract
Importance: Artificial intelligence (AI) could support clinicians when diagnosing hospitalized patients; however, systematic bias in AI models could worsen clinician diagnostic accuracy. Recent regulatory guidance has called for AI models to include explanations to mitigate errors made by models, but the effectiveness of this strategy has not been established.
Objectives: To evaluate the impact of systematically biased AI on clinician diagnostic accuracy and to determine if image-based AI model explanations can mitigate model errors.
Design, setting, and participants: Randomized clinical vignette survey study administered between April 2022 and January 2023 across 13 US states involving hospitalist physicians, nurse practitioners, and physician assistants.
Interventions: Clinicians were shown 9 clinical vignettes of patients hospitalized with acute respiratory failure, including their presenting symptoms, physical examination, laboratory results, and chest radiographs. Clinicians were then asked to determine the likelihood of pneumonia, heart failure, or chronic obstructive pulmonary disease as the underlying cause(s) of each patient's acute respiratory failure. To establish baseline diagnostic accuracy, clinicians were shown 2 vignettes without AI model input. Clinicians were then randomized to see 6 vignettes with AI model input with or without AI model explanations. Among these 6 vignettes, 3 vignettes included standard-model predictions, and 3 vignettes included systematically biased model predictions.
Main outcomes and measures: Clinician diagnostic accuracy for pneumonia, heart failure, and chronic obstructive pulmonary disease.
Results: Median participant age was 34 years (IQR, 31-39) and 241 (57.7%) were female. Four hundred fifty-seven clinicians were randomized and completed at least 1 vignette, with 231 randomized to AI model predictions without explanations, and 226 randomized to AI model predictions with explanations. Clinicians' baseline diagnostic accuracy was 73.0% (95% CI, 68.3% to 77.8%) for the 3 diagnoses. When shown a standard AI model without explanations, clinician accuracy increased over baseline by 2.9 percentage points (95% CI, 0.5 to 5.2) and by 4.4 percentage points (95% CI, 2.0 to 6.9) when clinicians were also shown AI model explanations. Systematically biased AI model predictions decreased clinician accuracy by 11.3 percentage points (95% CI, 7.2 to 15.5) compared with baseline and providing biased AI model predictions with explanations decreased clinician accuracy by 9.1 percentage points (95% CI, 4.9 to 13.2) compared with baseline, representing a nonsignificant improvement of 2.3 percentage points (95% CI, -2.7 to 7.2) compared with the systematically biased AI model.
Conclusions and relevance: Although standard AI models improve diagnostic accuracy, systematically biased AI models reduced diagnostic accuracy, and commonly used image-based AI model explanations did not mitigate this harmful effect.
Trial registration: ClinicalTrials.gov Identifier: NCT06098950.
Conflict of interest statement
Figures




Comment in
-
Automation Bias and Assistive AI: Risk of Harm From AI-Driven Clinical Decision Support.JAMA. 2023 Dec 19;330(23):2255-2257. doi: 10.1001/jama.2023.22557. JAMA. 2023. PMID: 38112824 No abstract available.
Similar articles
-
Deep Learning Assistance Closes the Accuracy Gap in Fracture Detection Across Clinician Types.Clin Orthop Relat Res. 2023 Mar 1;481(3):580-588. doi: 10.1097/CORR.0000000000002385. Epub 2022 Sep 9. Clin Orthop Relat Res. 2023. PMID: 36083847 Free PMC article.
-
Development and Assessment of an Artificial Intelligence-Based Tool for Skin Condition Diagnosis by Primary Care Physicians and Nurse Practitioners in Teledermatology Practices.JAMA Netw Open. 2021 Apr 1;4(4):e217249. doi: 10.1001/jamanetworkopen.2021.7249. JAMA Netw Open. 2021. PMID: 33909055 Free PMC article.
-
Care to Explain? AI Explanation Types Differentially Impact Chest Radiograph Diagnostic Performance and Physician Trust in AI.Radiology. 2024 Nov;313(2):e233261. doi: 10.1148/radiol.233261. Radiology. 2024. PMID: 39560483 Clinical Trial.
-
How Explainable Artificial Intelligence Can Increase or Decrease Clinicians' Trust in AI Applications in Health Care: Systematic Review.JMIR AI. 2024 Oct 30;3:e53207. doi: 10.2196/53207. JMIR AI. 2024. PMID: 39476365 Free PMC article. Review.
-
Lung Ultrasound for the Emergency Diagnosis of Pneumonia, Acute Heart Failure, and Exacerbations of Chronic Obstructive Pulmonary Disease/Asthma in Adults: A Systematic Review and Meta-analysis.J Emerg Med. 2019 Jan;56(1):53-69. doi: 10.1016/j.jemermed.2018.09.009. Epub 2018 Oct 9. J Emerg Med. 2019. PMID: 30314929
Cited by
-
Understanding Physician's Perspectives on AI in Health Care: Protocol for a Sequential Multiple Assignment Randomized Vignette Study.JMIR Res Protoc. 2024 Apr 4;13:e54787. doi: 10.2196/54787. JMIR Res Protoc. 2024. PMID: 38573756 Free PMC article.
-
What Is the Role of Explainability in Medical Artificial Intelligence? A Case-Based Approach.Bioengineering (Basel). 2025 Apr 2;12(4):375. doi: 10.3390/bioengineering12040375. Bioengineering (Basel). 2025. PMID: 40281735 Free PMC article.
-
Crucial Role of Understanding in Human-Artificial Intelligence Interaction for Successful Clinical Adoption.Korean J Radiol. 2025 Apr;26(4):287-290. doi: 10.3348/kjr.2025.0071. Epub 2025 Feb 17. Korean J Radiol. 2025. PMID: 40015562 Free PMC article. No abstract available.
-
Facilitating Trust Calibration in Artificial Intelligence-Driven Diagnostic Decision Support Systems for Determining Physicians' Diagnostic Accuracy: Quasi-Experimental Study.JMIR Form Res. 2024 Nov 27;8:e58666. doi: 10.2196/58666. JMIR Form Res. 2024. PMID: 39602469 Free PMC article.
-
Minimizing bias when using artificial intelligence in critical care medicine.J Crit Care. 2024 Aug;82:154796. doi: 10.1016/j.jcrc.2024.154796. Epub 2024 Mar 29. J Crit Care. 2024. PMID: 38552451 Free PMC article. No abstract available.
References
-
- Jabbour S, Fouhey D, Kazerooni E, Sjoding MW, Wiens J. Deep learning applied to chest x-rays: exploiting and preventing shortcuts. Proc Mach Learn Res. 2020;126:750-782.
Publication types
MeSH terms
Associated data
Grants and funding
LinkOut - more resources
Full Text Sources
Medical
Miscellaneous