ChatGPT Generated Otorhinolaryngology Multiple-Choice Questions: Quality, Psychometric Properties, and Suitability for Assessments

Cecilia Lotto^{1

2

3}, Sean C Sheppard¹, Wilma Anschuetz⁴, Daniel Stricker⁴, Giulia Molinari^{2

3}, Sören Huwendiek⁴, Lukas Anschuetz^{1

5

6}

Affiliations

¹ Department of Otorhinolaryngology, Head and Neck Surgery Inselspital, Bern University Hospital, University of Bern Bern Switzerland.
² Department of Otolaryngology, Head and Neck Surgery IRCCS Azienda Ospedaliero-Universitaria di Bologna Bologna Italy.
³ Department of Medical and Surgical Sciences Alma Mater Studiorum-University of Bologna Bologna Italy.
⁴ Institute for Medical Education University of Bern Bern Switzerland.
⁵ Department of Otorhinolaryngology, Head and Neck Surgery CHUV, University of Lausanne Lausanne Switzerland.
⁶ The Sense Innovation and Research Center Lausanne Switzerland.

PMID: 39328276
PMCID: PMC11424880
DOI: 10.1002/oto2.70018

ChatGPT Generated Otorhinolaryngology Multiple-Choice Questions: Quality, Psychometric Properties, and Suitability for Assessments

Cecilia Lotto et al. OTO Open. 2024.

. 2024 Sep 26;8(3):e70018.

doi: 10.1002/oto2.70018. eCollection 2024 Jul-Sep.

Authors

Cecilia Lotto^{1

2

3}, Sean C Sheppard¹, Wilma Anschuetz⁴, Daniel Stricker⁴, Giulia Molinari^{2

3}, Sören Huwendiek⁴, Lukas Anschuetz^{1

5

6}

Affiliations

¹ Department of Otorhinolaryngology, Head and Neck Surgery Inselspital, Bern University Hospital, University of Bern Bern Switzerland.
² Department of Otolaryngology, Head and Neck Surgery IRCCS Azienda Ospedaliero-Universitaria di Bologna Bologna Italy.
³ Department of Medical and Surgical Sciences Alma Mater Studiorum-University of Bologna Bologna Italy.
⁴ Institute for Medical Education University of Bern Bern Switzerland.
⁵ Department of Otorhinolaryngology, Head and Neck Surgery CHUV, University of Lausanne Lausanne Switzerland.
⁶ The Sense Innovation and Research Center Lausanne Switzerland.

PMID: 39328276
PMCID: PMC11424880
DOI: 10.1002/oto2.70018

Abstract

Objective: To explore Chat Generative Pretrained Transformer's (ChatGPT's) capability to create multiple-choice questions about otorhinolaryngology (ORL).

Study design: Experimental question generation and exam simulation.

Setting: Tertiary academic center.

Methods: ChatGPT 3.5 was prompted: "Can you please create a challenging 20-question multiple-choice questionnaire about clinical cases in otolaryngology, offering five answer options?." The generated questionnaire was sent to medical students, residents, and consultants. Questions were investigated regarding quality criteria. Answers were anonymized and the resulting data was analyzed in terms of difficulty and internal consistency.

Results: ChatGPT 3.5 generated 20 exam questions of which 1 question was considered off-topic, 3 questions had a false answer, and 3 questions had multiple correct answers. Subspecialty theme repartition was as follows: 5 questions were on otology, 5 about rhinology, and 10 questions addressed head and neck. The qualities of focus and relevance were good while the vignette and distractor qualities were low. The level of difficulty was suitable for undergraduate medical students (n = 24), but too easy for residents (n = 30) or consultants (n = 10) in ORL. Cronbach's α was highest (.69) with 15 selected questions using students' results.

Conclusion: ChatGPT 3.5 is able to generate grammatically correct simple ORL multiple choice questions for a medical student level. However, the overall quality of the questions was average, needing thorough review and revision by a medical expert to ensure suitability in future exams.

Keywords: ChatGPT; artificial intelligence; exam; large language model; multiple choice question; otolaryngology.

PubMed Disclaimer

Conflict of interest statement

The authors declare that there is no conflict of interest.

Figures

**Figure 1**
Formal quality assessments. CI, confidence interval.

**Figure 2**
Mean difficulty values as percentage of correct answers displayed for all questions and all 3 groups of participants separately.

**Figure 3**
Mean values of percentage correct answers displayed for the 3 groups and all 3 subscores separately. Error bars denote the 95% confidence interval of the means.

See this image and copyright information in PMC

References

1. OpenAI . What is ChatGPT? 2024. Accessed July 10, 2024. https://help.openai.com/en/articles/6783457-what-is-chatgpt
1. Lee P, Bubeck S, Petro J. Benefits, limits, and risks of GPT‐4 as an AI chatbot for medicine. N Engl J Med. 2023;388:1233‐1239. - PubMed
1. Liu J, Wang C, Liu S. Utility of ChatGPT in clinical practice. J Med Internet Res. 2023;25:e48568. - PMC - PubMed
1. Cascella M, Montomoli J, Bellini V, Bignami E. Evaluating the feasibility of ChatGPT in healthcare: an analysis of multiple clinical and research scenarios. J Med Syst. 2023;47:33. - PMC - PubMed
1. Mu Y, He D. The Potential applications and challenges of ChatGPT in the medical field. Int J Gen Med. 2024;17:817‐826. - PMC - PubMed

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

ChatGPT Generated Otorhinolaryngology Multiple-Choice Questions: Quality, Psychometric Properties, and Suitability for Assessments

Affiliations

ChatGPT Generated Otorhinolaryngology Multiple-Choice Questions: Quality, Psychometric Properties, and Suitability for Assessments

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

LinkOut - more resources

Full Text Sources