Performance of ChatGPT on the Situational Judgement Test-A Professional Dilemmas-Based Examination for Doctors in the United Kingdom

Robin J Borchert^{1

2}, Charlotte R Hickman³, Jack Pepys⁴, Timothy J Sadler²

Affiliations

¹ Department of Radiology, University of Cambridge, Cambridge, United Kingdom.
² Department of Radiology, Addenbrooke's Hospital, Cambridge University Hospitals NHS Foundation Trust, Cambridge, United Kingdom.
³ Department of General Medicine, Lister Hospital, East and North Hertfordshire NHS Trust, Stevenage, United Kingdom.
⁴ Department of Biomedical Sciences, Humanitas University, Milan, Italy.

PMID: 37548997
PMCID: PMC10442724
DOI: 10.2196/48978

Performance of ChatGPT on the Situational Judgement Test-A Professional Dilemmas-Based Examination for Doctors in the United Kingdom

Robin J Borchert et al. JMIR Med Educ. 2023.

. 2023 Aug 7:9:e48978.

doi: 10.2196/48978.

Authors

Robin J Borchert^{1

2}, Charlotte R Hickman³, Jack Pepys⁴, Timothy J Sadler²

Affiliations

¹ Department of Radiology, University of Cambridge, Cambridge, United Kingdom.
² Department of Radiology, Addenbrooke's Hospital, Cambridge University Hospitals NHS Foundation Trust, Cambridge, United Kingdom.
³ Department of General Medicine, Lister Hospital, East and North Hertfordshire NHS Trust, Stevenage, United Kingdom.
⁴ Department of Biomedical Sciences, Humanitas University, Milan, Italy.

PMID: 37548997
PMCID: PMC10442724
DOI: 10.2196/48978

Abstract

Background: ChatGPT is a large language model that has performed well on professional examinations in the fields of medicine, law, and business. However, it is unclear how ChatGPT would perform on an examination assessing professionalism and situational judgement for doctors.

Objective: We evaluated the performance of ChatGPT on the Situational Judgement Test (SJT): a national examination taken by all final-year medical students in the United Kingdom. This examination is designed to assess attributes such as communication, teamwork, patient safety, prioritization skills, professionalism, and ethics.

Methods: All questions from the UK Foundation Programme Office's (UKFPO's) 2023 SJT practice examination were inputted into ChatGPT. For each question, ChatGPT's answers and rationales were recorded and assessed on the basis of the official UK Foundation Programme Office scoring template. Questions were categorized into domains of Good Medical Practice on the basis of the domains referenced in the rationales provided in the scoring sheet. Questions without clear domain links were screened by reviewers and assigned one or multiple domains. ChatGPT's overall performance, as well as its performance across the domains of Good Medical Practice, was evaluated.

Results: Overall, ChatGPT performed well, scoring 76% on the SJT but scoring full marks on only a few questions (9%), which may reflect possible flaws in ChatGPT's situational judgement or inconsistencies in the reasoning across questions (or both) in the examination itself. ChatGPT demonstrated consistent performance across the 4 outlined domains in Good Medical Practice for doctors.

Conclusions: Further research is needed to understand the potential applications of large language models, such as ChatGPT, in medical education for standardizing questions and providing consistent rationales for examinations assessing professionalism and ethics.

Keywords: ChatGPT; SJT; Situational Judgement Test; artificial intelligence; chatbot; communication; exam; examination; judgement; language model; language models; medical education; reasoning.

©Robin J Borchert, Charlotte R Hickman, Jack Pepys, Timothy J Sadler. Originally published in JMIR Medical Education (https://mededu.jmir.org), 07.08.2023.

PubMed Disclaimer

Conflict of interest statement

Conflicts of Interest: None declared.

Figures

**Figure 1**
ChatGPT’s performance in each section of the examination depicting the proportion of entirely correct (100%), mostly correct (50%-99%), or mostly incorrect answers (<50%). MCQ: multiple-choice question. Q: question.

See this image and copyright information in PMC

References

1. Brown TB, Mann B, Ryder N, Subbiah M, Kaplan J, Dhariwal P, Neelakantan A, Shyam P, Sastry G, Askell A, Agarwal S, Herbert-Voss A, Krueger G, Henighan T, Child R, Ramesh A, Ziegler DM, Wu J, Winter C, Hesse C, Chen M, Sigler E, Litwin M, Gray S, Chess B, Clark J, Berner C, McCandlish S, Radford A, Sutskever I, Amodei D. Language models are few-shot learners. arXiv. doi: 10.5860/choice.189890. Preprint posted online May 28, 2020. - DOI
1. Choi JH, Hickman KE, Monahan A, Schwarcz DB. ChatGPT Goes to Law School. SSRN Journal. 2023 doi: 10.2139/ssrn.4335905. - DOI
1. Terwiesch C. Would Chat GPT Get a Wharton MBA? New White Paper By Christian Terwiesch. Mack Institute for Innovation Management at the Wharton School, University of Pennsylvania. 2023. [2023-07-26]. https://mackinstitute.wharton.upenn.edu/2023/would-chat-gpt3-get-a-whart...
1. Kung TH, Cheatham M, Medenilla A, Sillos C, De Leon L, Elepaño Camille, Madriaga M, Aggabao R, Diaz-Candido G, Maningo J, Tseng V. Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models. PLOS Digit Health. 2023 Mar 9;2(2):e0000198. doi: 10.1371/journal.pdig.0000198. https://europepmc.org/abstract/MED/36812645 PDIG-D-22-00371 - DOI - PMC - PubMed
1. Giannos P, Delardas O. Performance of ChatGPT on UK Standardized Admission Tests: Insights From the BMAT, TMUA, LNAT, and TSA Examinations. JMIR Med Educ. 2023 Apr 26;9:e47737. doi: 10.2196/47737. https://mededu.jmir.org/2023//e47737/ v9i1e47737 - DOI - PMC - PubMed

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Performance of ChatGPT on the Situational Judgement Test-A Professional Dilemmas-Based Examination for Doctors in the United Kingdom

Affiliations

Performance of ChatGPT on the Situational Judgement Test-A Professional Dilemmas-Based Examination for Doctors in the United Kingdom

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

LinkOut - more resources

Full Text Sources