. 2024 Jul 18;14(7):e087469.

doi: 10.1136/bmjopen-2024-087469.

Effects of interacting with a large language model compared with a human coach on the clinical diagnostic process and outcomes among fourth-year medical students: study protocol for a prospective, randomised experiment using patient vignettes

Juliane E Kämmer¹, Wolf E Hautz², Gert Krummrey³, Thomas C Sauter², Dorothea Penders^{4

5}, Tanja Birrenbach², Nadine Bienefeld⁶

Affiliations

¹ Department of Emergency Medicine, Inselspital University Hospital Bern, University of Bern, Bern, Switzerland juliane.kaemmer@unibe.ch.
² Department of Emergency Medicine, Inselspital University Hospital Bern, University of Bern, Bern, Switzerland.
³ Institute for Medical Informatics (I4MI), Bern University of Applied Sciences, Bern, Switzerland.
⁴ Department of Anesthesiology and Operative Intensive Care Medicine CCM & CVK, Charité Universitätsmedizin Berlin, Berlin, Germany.
⁵ Lernzentrum (Skills Lab), Charité Universitätsmedizin Berlin, Berlin, Germany.
⁶ Department of Management, Technology, and Economics, ETH Zurich, Zurich, Switzerland.

PMID: 39025818
PMCID: PMC11261684
DOI: 10.1136/bmjopen-2024-087469

Effects of interacting with a large language model compared with a human coach on the clinical diagnostic process and outcomes among fourth-year medical students: study protocol for a prospective, randomised experiment using patient vignettes

Juliane E Kämmer et al. BMJ Open. 2024.

. 2024 Jul 18;14(7):e087469.

doi: 10.1136/bmjopen-2024-087469.

Authors

Juliane E Kämmer¹, Wolf E Hautz², Gert Krummrey³, Thomas C Sauter², Dorothea Penders^{4

5}, Tanja Birrenbach², Nadine Bienefeld⁶

Affiliations

¹ Department of Emergency Medicine, Inselspital University Hospital Bern, University of Bern, Bern, Switzerland juliane.kaemmer@unibe.ch.
² Department of Emergency Medicine, Inselspital University Hospital Bern, University of Bern, Bern, Switzerland.
³ Institute for Medical Informatics (I4MI), Bern University of Applied Sciences, Bern, Switzerland.
⁴ Department of Anesthesiology and Operative Intensive Care Medicine CCM & CVK, Charité Universitätsmedizin Berlin, Berlin, Germany.
⁵ Lernzentrum (Skills Lab), Charité Universitätsmedizin Berlin, Berlin, Germany.
⁶ Department of Management, Technology, and Economics, ETH Zurich, Zurich, Switzerland.

PMID: 39025818
PMCID: PMC11261684
DOI: 10.1136/bmjopen-2024-087469

Abstract

Introduction: Versatile large language models (LLMs) have the potential to augment diagnostic decision-making by assisting diagnosticians, thanks to their ability to engage in open-ended, natural conversations and their comprehensive knowledge access. Yet the novelty of LLMs in diagnostic decision-making introduces uncertainties regarding their impact. Clinicians unfamiliar with the use of LLMs in their professional context may rely on general attitudes towards LLMs more broadly, potentially hindering thoughtful use and critical evaluation of their input, leading to either over-reliance and lack of critical thinking or an unwillingness to use LLMs as diagnostic aids. To address these concerns, this study examines the influence on the diagnostic process and outcomes of interacting with an LLM compared with a human coach, and of prior training vs no training for interacting with either of these 'coaches'. Our findings aim to illuminate the potential benefits and risks of employing artificial intelligence (AI) in diagnostic decision-making.

Methods and analysis: We are conducting a prospective, randomised experiment with N=158 fourth-year medical students from Charité Medical School, Berlin, Germany. Participants are asked to diagnose patient vignettes after being assigned to either a human coach or ChatGPT and after either training or no training (both between-subject factors). We are specifically collecting data on the effects of using either of these 'coaches' and of additional training on information search, number of hypotheses entertained, diagnostic accuracy and confidence. Statistical methods will include linear mixed effects models. Exploratory analyses of the interaction patterns and attitudes towards AI will also generate more generalisable knowledge about the role of AI in medicine.

Ethics and dissemination: The Bern Cantonal Ethics Committee considered the study exempt from full ethical review (BASEC No: Req-2023-01396). All methods will be conducted in accordance with relevant guidelines and regulations. Participation is voluntary and informed consent will be obtained. Results will be published in peer-reviewed scientific medical journals. Authorship will be determined according to the International Committee of Medical Journal Editors guidelines.

Keywords: Artificial Intelligence; Clinical Decision-Making; Clinical Reasoning; MEDICAL EDUCATION & TRAINING.

PubMed Disclaimer

Conflict of interest statement

Competing interests: None declared.

Figures

**Figure 1. Study design. AI, artificial intelligence; ChatGPT, OpenAI’s generative pre-trained transformer; LLMs, large language models; R, randomisation.**

Figure 2. Screenshot of a patient case page. Starting on the left, there is a window showing the current step within the experiment and the patient chart with several subcategories, above the field for entering the differential diagnoses; on the right is the chat window (here, in the artificial intelligence condition).

See this image and copyright information in PMC

References

1. Berner ES, Graber ML. Overconfidence as a cause of diagnostic error in medicine. Am J Med. 2008;121:S2–23. doi: 10.1016/j.amjmed.2008.01.001. - DOI - PubMed
1. Newman-Toker DE, Peterson SM, Badihian S, et al. Agency for healthcare research and quality (AHRQ) 2022. Diagnostic errors in the emergency department: a systematic review.https://effectivehealthcare.ahrq.gov/products/diagnostic-errors-emergenc... Available. - PubMed
1. Singh H, Meyer AND, Thomas EJ. The frequency of diagnostic errors in outpatient care: estimations from three large observational studies involving US adult populations. BMJ Qual Saf. 2014;23:727–31. doi: 10.1136/bmjqs-2013-002627. - DOI - PMC - PubMed
1. Miller BT, Balogh EP, editors. Improving diagnosis in health care. Washington, D.C: National Academies Press; 2015. [15-Nov-2019]. Committee on diagnostic error in health care, board on health care services, Institute of medicine, the National academies of sciences, engineering, and medicine.http://www.nap.edu/catalog/21794 Available. accessed. - PubMed
1. Hautz WE, Kämmer JE, Hautz SC, et al. Diagnostic error increases mortality and length of hospital stay in patients presenting through the emergency room. Scand J Trauma Resusc Emerg Med. 2019;27:54. doi: 10.1186/s13049-019-0629-z. - DOI - PMC - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Effects of interacting with a large language model compared with a human coach on the clinical diagnostic process and outcomes among fourth-year medical students: study protocol for a prospective, randomised experiment using patient vignettes

Affiliations

Effects of interacting with a large language model compared with a human coach on the clinical diagnostic process and outcomes among fourth-year medical students: study protocol for a prospective, randomised experiment using patient vignettes

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources