This is a preprint.
Large Language Model Influence on Management Reasoning: A Randomized Controlled Trial
- PMID: 39148822
- PMCID: PMC11326321
- DOI: 10.1101/2024.08.05.24311485
Large Language Model Influence on Management Reasoning: A Randomized Controlled Trial
Update in
-
Large Language Model Influence on Diagnostic Reasoning: A Randomized Clinical Trial.JAMA Netw Open. 2024 Oct 1;7(10):e2440969. doi: 10.1001/jamanetworkopen.2024.40969. JAMA Netw Open. 2024. PMID: 39466245 Free PMC article. Clinical Trial.
Abstract
Importance: Large language model (LLM) artificial intelligence (AI) systems have shown promise in diagnostic reasoning, but their utility in management reasoning with no clear right answers is unknown.
Objective: To determine whether LLM assistance improves physician performance on open-ended management reasoning tasks compared to conventional resources.
Design: Prospective, randomized controlled trial conducted from 30 November 2023 to 21 April 2024.
Setting: Multi-institutional study from Stanford University, Beth Israel Deaconess Medical Center, and the University of Virginia involving physicians from across the United States.
Participants: 92 practicing attending physicians and residents with training in internal medicine, family medicine, or emergency medicine.
Intervention: Five expert-developed clinical case vignettes were presented with multiple open-ended management questions and scoring rubrics created through a Delphi process. Physicians were randomized to use either GPT-4 via ChatGPT Plus in addition to conventional resources (e.g., UpToDate, Google), or conventional resources alone.
Main outcomes and measures: The primary outcome was difference in total score between groups on expert-developed scoring rubrics. Secondary outcomes included domain-specific scores and time spent per case.
Results: Physicians using the LLM scored higher compared to those using conventional resources (mean difference 6.5 %, 95% CI 2.7-10.2, p<0.001). Significant improvements were seen in management decisions (6.1%, 95% CI 2.5-9.7, p=0.001), diagnostic decisions (12.1%, 95% CI 3.1-21.0, p=0.009), and case-specific (6.2%, 95% CI 2.4-9.9, p=0.002) domains. GPT-4 users spent more time per case (mean difference 119.3 seconds, 95% CI 17.4-221.2, p=0.02). There was no significant difference between GPT-4-augmented physicians and GPT-4 alone (-0.9%, 95% CI -9.0 to 7.2, p=0.8).
Conclusions and relevance: LLM assistance improved physician management reasoning compared to conventional resources, with particular gains in contextual and patient-specific decision-making. These findings indicate that LLMs can augment management decision-making in complex cases.
Trial registration: ClinicalTrials.gov Identifier: NCT06208423; https://classic.clinicaltrials.gov/ct2/show/NCT06208423.
Figures
References
-
- Tu T, Palepu A, Schaekermann M, et al. Towards Conversational Diagnostic AI.
-
- McDuff D, Schaekermann M, Tu T, et al. Towards Accurate Differential Diagnosis with Large Language Models. - PubMed
-
- Goh E, Gallo R, Hom J, et al. Influence of a Large Language Model on Diagnostic Reasoning: A Randomized Clinical Vignette Study. medRxiv. Published online March 14, 2024. doi:10.1101/2024.03.12.24303785 - DOI
Publication types
Associated data
Grants and funding
LinkOut - more resources
Full Text Sources
Medical
Miscellaneous