Randomized Controlled Trial

. 2024 Nov 28;24(1):1391.

doi: 10.1186/s12909-024-06399-7.

Large language models improve clinical decision making of medical students through patient simulation and structured feedback: a randomized controlled trial

Emilia Brügge^#^{1

2}, Sarah Ricchizzi^#^{1

2}, Malin Arenbeck^{1

2}, Marius Niklas Keller^{1

2}, Lina Schur^{1

2}, Walter Stummer², Markus Holling², Max Hao Lu³, Dogus Darici⁴

Affiliations

¹ Connectome - Student Association for Neurosurgery, Neurology and Neurosciences, Berlin, Germany.
² Department of Neurosurgery, University Hospital of Münster, Münster, Germany.
³ Harvard Graduate School of Education, Cambridge, USA.
⁴ Institute of Anatomy and Neurobiology, University of Münster, Vesaliusweg 2-4, 48149, Münster, Germany. darici@uni-muenster.de.

^# Contributed equally.

PMID: 39609823
PMCID: PMC11605890
DOI: 10.1186/s12909-024-06399-7

Randomized Controlled Trial

Large language models improve clinical decision making of medical students through patient simulation and structured feedback: a randomized controlled trial

Emilia Brügge et al. BMC Med Educ. 2024.

. 2024 Nov 28;24(1):1391.

doi: 10.1186/s12909-024-06399-7.

Authors

Emilia Brügge^#^{1

2}, Sarah Ricchizzi^#^{1

2}, Malin Arenbeck^{1

2}, Marius Niklas Keller^{1

2}, Lina Schur^{1

2}, Walter Stummer², Markus Holling², Max Hao Lu³, Dogus Darici⁴

Affiliations

¹ Connectome - Student Association for Neurosurgery, Neurology and Neurosciences, Berlin, Germany.
² Department of Neurosurgery, University Hospital of Münster, Münster, Germany.
³ Harvard Graduate School of Education, Cambridge, USA.
⁴ Institute of Anatomy and Neurobiology, University of Münster, Vesaliusweg 2-4, 48149, Münster, Germany. darici@uni-muenster.de.

^# Contributed equally.

PMID: 39609823
PMCID: PMC11605890
DOI: 10.1186/s12909-024-06399-7

Abstract

Background: Clinical decision-making (CDM) refers to physicians' ability to gather, evaluate, and interpret relevant diagnostic information. An integral component of CDM is the medical history conversation, traditionally practiced on real or simulated patients. In this study, we explored the potential of using Large Language Models (LLM) to simulate patient-doctor interactions and provide structured feedback.

Methods: We developed AI prompts to simulate patients with different symptoms, engaging in realistic medical history conversations. In our double-blind randomized design, the control group participated in simulated medical history conversations with AI patients (control group), while the intervention group, in addition to simulated conversations, also received AI-generated feedback on their performances (feedback group). We examined the influence of feedback based on their CDM performance, which was evaluated by two raters (ICC = 0.924) using the Clinical Reasoning Indicator - History Taking Inventory (CRI-HTI). The data was analyzed using an ANOVA for repeated measures.

Results: Our final sample included 21 medical students (age_mean = 22.10 years, semester_mean = 4, 14 females). At baseline, the feedback group (mean = 3.28 ± 0.09 [standard deviation]) and the control group (3.21 ± 0.08) achieved similar CRI-HTI scores, indicating successful randomization. After only four training sessions, the feedback group (3.60 ± 0.13) outperformed the control group (3.02 ± 0.12), F (1,18) = 4.44, p = .049 with a strong effect size, partial η² = 0.198. Specifically, the feedback group showed improvements in the subdomains of CDM of creating context (p = .046) and securing information (p = .018), while their ability to focus questions did not improve significantly (p = .265).

Conclusion: The results suggest that AI-simulated medical history conversations can support CDM training, especially when combined with structured feedback. Such training format may serve as a cost-effective supplement to existing training methods, better preparing students for real medical history conversations.

Keywords: Clinical decision making; Large language models; Medical students education; Patient simulation training; Structured feedback.

PubMed Disclaimer

Conflict of interest statement

Declarations. Ethics approval and consent to participate: Ethics approval was obtained from the ethics board (“Ethik-Kommission Westfalen-Lippe”) under the reference 2023-438-f-N. Informed consent was obtained from all participants. Consent for publication: Not applicable. Competing interests: The authors declare no competing interests.

Figures

**Fig. 1**
Study design. The diagram illustrates the timeline of the study. The control group completed four patient cases, with a 2-minute break between each case. In contrast, the feedback group received feedback from ChatGPT during these breaks instead of pausing. After fulfilling the tasks, the participants received a survey with no time limit

**Fig. 2**
Example of a conversation between the participant and ChatGPT. The prompt of the fourth scenario instructed ChatGPT to simulate a patient with a concussion. Italicized text represents the responses given by ChatGPT

**Fig. 3**
Example of the feedback using the CRI-HTI-Score and the individual feedback given by ChatGPT. The participant received the feedback after completing scenario 2. This is the same participant as in Fig. 3. Italicized text represents the responses given by ChatGPT

**Fig. 4**
Effects of AI-generated feedback on clinical decision-making and history taking. Median CRI-HT Score (ICC = 0.924) with the individual values is shown for the control group (red) and feedback group (blue) over four subsequent AI-simulated history taking scenarios

See this image and copyright information in PMC

References

1. Macauley K, Brudvig T, Kadakia M, Bonneville M. Systematic review of assessments that evaluate clinical decision making, clinical reasoning, and critical thinking changes after simulation participation. J Phys Ther Educ. 2017;31(4):64–75. 10.1097/JTE.0000000000000011. - DOI
1. Barrows HS. An overview of the uses of standardized patients for teaching and evaluating clinical skills. Acad Med. 1993;68(6):443–51. 10.1097/00001888-199306000-00002. - DOI - PubMed
1. Edwards I, Jones M, Carr J, Braunack-Mayer A, Jensen GM. Clinical reasoning strategies in physical therapy. Phys Ther. 2004;84(4):312–30. 10.1093/ptj/84.4.312. - DOI - PubMed
1. Kneebone R, Nestel D, Wetzel C, et al. The human face of simulation: patient-focused simulation training. Acad Med. 2006;81(10):919–24. 10.1097/01.ACM.0000238323.73623.c2. - DOI - PubMed
1. Gillette C, Stanton RB, Rockich-Winston N, Rudolph M, Anderson HG Jr. Cost-effectiveness of using standardized patients to assess student-pharmacist communication skills. Am J Pharm Educ. 2017;81(10):6120. 10.5688/ajpe6120. - DOI - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
- BioMed Central
- PubMed Central
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Large language models improve clinical decision making of medical students through patient simulation and structured feedback: a randomized controlled trial

Affiliations

Large language models improve clinical decision making of medical students through patient simulation and structured feedback: a randomized controlled trial

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Miscellaneous