ChatGPT's quiz skills in different otolaryngology subspecialties: an analysis of 2576 single-choice and multiple-choice board certification preparation questions

Cosima C Hoch¹, Barbara Wollenberg², Jan-Christoffer Lüers³, Samuel Knoedler^{4

5}, Leonard Knoedler⁶, Konstantin Frank⁷, Sebastian Cotofana^{8

9}, Michael Alfertshofer¹⁰

Affiliations

¹ Department of Otolaryngology, Head and Neck Surgery, School of Medicine, Technical University of Munich (TUM), Ismaningerstrasse 22, 81675, Munich, Germany. cosima.chiara.hoch@tum.de.
² Department of Otolaryngology, Head and Neck Surgery, School of Medicine, Technical University of Munich (TUM), Ismaningerstrasse 22, 81675, Munich, Germany.
³ Department of Otorhinolaryngology, Head and Neck Surgery, Medical Faculty, University of Cologne, 50937, Cologne, Germany.
⁴ Division of Plastic Surgery, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, 02152, USA.
⁵ Department of Plastic Surgery and Hand Surgery, Klinikum Rechts Der Isar, Technical University of Munich, Munich, Germany.
⁶ Division of Plastic and Reconstructive Surgery, Massachusetts General Hospital, Harvard Medical School, Boston, MA, 02115, USA.
⁷ Ocean Clinic, Marbella, Spain.
⁸ Department of Dermatology, Erasmus Hospital, Rotterdam, The Netherlands.
⁹ Centre for Cutaneous Research, Blizard Institute, Queen Mary University of London, London, UK.
¹⁰ Division of Hand, Plastic and Aesthetic Surgery, Ludwig-Maximilians-University Munich, Munich, Germany.

PMID: 37285018
PMCID: PMC10382366
DOI: 10.1007/s00405-023-08051-4

ChatGPT's quiz skills in different otolaryngology subspecialties: an analysis of 2576 single-choice and multiple-choice board certification preparation questions

Cosima C Hoch et al. Eur Arch Otorhinolaryngol. 2023 Sep.

. 2023 Sep;280(9):4271-4278.

doi: 10.1007/s00405-023-08051-4. Epub 2023 Jun 7.

Authors

Cosima C Hoch¹, Barbara Wollenberg², Jan-Christoffer Lüers³, Samuel Knoedler^{4

5}, Leonard Knoedler⁶, Konstantin Frank⁷, Sebastian Cotofana^{8

9}, Michael Alfertshofer¹⁰

Affiliations

¹ Department of Otolaryngology, Head and Neck Surgery, School of Medicine, Technical University of Munich (TUM), Ismaningerstrasse 22, 81675, Munich, Germany. cosima.chiara.hoch@tum.de.
² Department of Otolaryngology, Head and Neck Surgery, School of Medicine, Technical University of Munich (TUM), Ismaningerstrasse 22, 81675, Munich, Germany.
³ Department of Otorhinolaryngology, Head and Neck Surgery, Medical Faculty, University of Cologne, 50937, Cologne, Germany.
⁴ Division of Plastic Surgery, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, 02152, USA.
⁵ Department of Plastic Surgery and Hand Surgery, Klinikum Rechts Der Isar, Technical University of Munich, Munich, Germany.
⁶ Division of Plastic and Reconstructive Surgery, Massachusetts General Hospital, Harvard Medical School, Boston, MA, 02115, USA.
⁷ Ocean Clinic, Marbella, Spain.
⁸ Department of Dermatology, Erasmus Hospital, Rotterdam, The Netherlands.
⁹ Centre for Cutaneous Research, Blizard Institute, Queen Mary University of London, London, UK.
¹⁰ Division of Hand, Plastic and Aesthetic Surgery, Ludwig-Maximilians-University Munich, Munich, Germany.

PMID: 37285018
PMCID: PMC10382366
DOI: 10.1007/s00405-023-08051-4

Abstract

Purpose: With the increasing adoption of artificial intelligence (AI) in various domains, including healthcare, there is growing acceptance and interest in consulting AI models to provide medical information and advice. This study aimed to evaluate the accuracy of ChatGPT's responses to practice quiz questions designed for otolaryngology board certification and decipher potential performance disparities across different otolaryngology subspecialties.

Methods: A dataset covering 15 otolaryngology subspecialties was collected from an online learning platform funded by the German Society of Oto-Rhino-Laryngology, Head and Neck Surgery, designed for board certification examination preparation. These questions were entered into ChatGPT, with its responses being analyzed for accuracy and variance in performance.

Results: The dataset included 2576 questions (479 multiple-choice and 2097 single-choice), of which 57% (n = 1475) were answered correctly by ChatGPT. An in-depth analysis of question style revealed that single-choice questions were associated with a significantly higher rate (p < 0.001) of correct responses (n = 1313; 63%) compared to multiple-choice questions (n = 162; 34%). Stratified by question categories, ChatGPT yielded the highest rate of correct responses (n = 151; 72%) in the field of allergology, whereas 7 out of 10 questions (n = 65; 71%) on legal otolaryngology aspects were answered incorrectly.

Conclusion: The study reveals ChatGPT's potential as a supplementary tool for otolaryngology board certification preparation. However, its propensity for errors in certain otolaryngology areas calls for further refinement. Future research should address these limitations to improve ChatGPT's educational use. An approach, with expert collaboration, is recommended for the reliable and accurate integration of such AI models.

Keywords: AI; Artificial intelligence; ChatGPT; Multiple-choice; Otolaryngology quiz; Single-choice.

PubMed Disclaimer

Conflict of interest statement

The authors have no relevant financial or non-financial interests to disclose. Jan-Christoffer Lüers, M.D., Ph.D. is the developer and owner of the online learning platform.

Figures

**Fig. 1**
Workflow summarizing the methodology used in the study, as well as showing the integration of intensified research on artificial intelligence in medicine

**Fig. 2**
Examples of ChatGPT prompts for both multiple-choice and single-choice style questions, with correct and false responses indicated for each type of question

**Fig. 3**
Stacked bar graphs displaying the correct and false response rates for each otolaryngology subspecialty. The correct response rates are represented by green bars, while the false response rates are represented by red bars. The subspecialties are ordered in ascending order based on their correct response rates

**Fig. 4**
Donut charts illustrating the correct versus false rates for multiple-choice and single-choice questions, stratified by otolaryngology subspecialty. The correct rates are represented by the green sections of the charts, while the false rates are represented by the red sections. The size of each donut chart is proportional to the total number of questions in each otolaryngology subspecialty

See this image and copyright information in PMC

Comment in

Examining otolaryngologists' attitudes towards large language models (LLMs) such as ChatGPT: a comprehensive deep learning analysis.
Praveen SV, Vijaya S. Praveen SV, et al. Eur Arch Otorhinolaryngol. 2024 Feb;281(2):1061-1063. doi: 10.1007/s00405-023-08325-x. Epub 2023 Nov 13. Eur Arch Otorhinolaryngol. 2024. PMID: 37955694 No abstract available.

References

1. Knoedler L, et al. A ready-to-use grading tool for facial palsy examiners-automated grading system in facial palsy patients made easy. J Pers Med. 2022;12(10):1739. doi: 10.3390/jpm12101739. - DOI - PMC - PubMed
1. Vimont A, Leleu H, Durand-Zaleski I. Machine learning versus regression modelling in predicting individual healthcare costs from a representative sample of the nationwide claims database in France. Eur J Health Econ. 2022;23(2):211–223. doi: 10.1007/s10198-021-01363-4. - DOI - PubMed
1. Rogers MP, et al. A machine learning approach to high-risk cardiac surgery risk scoring. J Card Surg. 2022;37(12):4612–4620. doi: 10.1111/jocs.17110. - DOI - PubMed
1. Esteva A, et al. A guide to deep learning in healthcare. Nat Med. 2019;25(1):24–29. doi: 10.1038/s41591-018-0316-z. - DOI - PubMed
1. Knoedler L, et al. Towards a reliable and rapid automated grading system in facial palsy patients: facial palsy surgery meets computer science. J Clin Med. 2022;11(17):4998. doi: 10.3390/jcm11174998. - DOI - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

ChatGPT's quiz skills in different otolaryngology subspecialties: an analysis of 2576 single-choice and multiple-choice board certification preparation questions

Affiliations

ChatGPT's quiz skills in different otolaryngology subspecialties: an analysis of 2576 single-choice and multiple-choice board certification preparation questions

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Comment in

References

MeSH terms

LinkOut - more resources

Full Text Sources