Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Sep;280(9):4271-4278.
doi: 10.1007/s00405-023-08051-4. Epub 2023 Jun 7.

ChatGPT's quiz skills in different otolaryngology subspecialties: an analysis of 2576 single-choice and multiple-choice board certification preparation questions

Affiliations

ChatGPT's quiz skills in different otolaryngology subspecialties: an analysis of 2576 single-choice and multiple-choice board certification preparation questions

Cosima C Hoch et al. Eur Arch Otorhinolaryngol. 2023 Sep.

Abstract

Purpose: With the increasing adoption of artificial intelligence (AI) in various domains, including healthcare, there is growing acceptance and interest in consulting AI models to provide medical information and advice. This study aimed to evaluate the accuracy of ChatGPT's responses to practice quiz questions designed for otolaryngology board certification and decipher potential performance disparities across different otolaryngology subspecialties.

Methods: A dataset covering 15 otolaryngology subspecialties was collected from an online learning platform funded by the German Society of Oto-Rhino-Laryngology, Head and Neck Surgery, designed for board certification examination preparation. These questions were entered into ChatGPT, with its responses being analyzed for accuracy and variance in performance.

Results: The dataset included 2576 questions (479 multiple-choice and 2097 single-choice), of which 57% (n = 1475) were answered correctly by ChatGPT. An in-depth analysis of question style revealed that single-choice questions were associated with a significantly higher rate (p < 0.001) of correct responses (n = 1313; 63%) compared to multiple-choice questions (n = 162; 34%). Stratified by question categories, ChatGPT yielded the highest rate of correct responses (n = 151; 72%) in the field of allergology, whereas 7 out of 10 questions (n = 65; 71%) on legal otolaryngology aspects were answered incorrectly.

Conclusion: The study reveals ChatGPT's potential as a supplementary tool for otolaryngology board certification preparation. However, its propensity for errors in certain otolaryngology areas calls for further refinement. Future research should address these limitations to improve ChatGPT's educational use. An approach, with expert collaboration, is recommended for the reliable and accurate integration of such AI models.

Keywords: AI; Artificial intelligence; ChatGPT; Multiple-choice; Otolaryngology quiz; Single-choice.

PubMed Disclaimer

Conflict of interest statement

The authors have no relevant financial or non-financial interests to disclose. Jan-Christoffer Lüers, M.D., Ph.D. is the developer and owner of the online learning platform.

Figures

Fig. 1
Fig. 1
Workflow summarizing the methodology used in the study, as well as showing the integration of intensified research on artificial intelligence in medicine
Fig. 2
Fig. 2
Examples of ChatGPT prompts for both multiple-choice and single-choice style questions, with correct and false responses indicated for each type of question
Fig. 3
Fig. 3
Stacked bar graphs displaying the correct and false response rates for each otolaryngology subspecialty. The correct response rates are represented by green bars, while the false response rates are represented by red bars. The subspecialties are ordered in ascending order based on their correct response rates
Fig. 4
Fig. 4
Donut charts illustrating the correct versus false rates for multiple-choice and single-choice questions, stratified by otolaryngology subspecialty. The correct rates are represented by the green sections of the charts, while the false rates are represented by the red sections. The size of each donut chart is proportional to the total number of questions in each otolaryngology subspecialty

Comment in

References

    1. Knoedler L, et al. A ready-to-use grading tool for facial palsy examiners-automated grading system in facial palsy patients made easy. J Pers Med. 2022;12(10):1739. doi: 10.3390/jpm12101739. - DOI - PMC - PubMed
    1. Vimont A, Leleu H, Durand-Zaleski I. Machine learning versus regression modelling in predicting individual healthcare costs from a representative sample of the nationwide claims database in France. Eur J Health Econ. 2022;23(2):211–223. doi: 10.1007/s10198-021-01363-4. - DOI - PubMed
    1. Rogers MP, et al. A machine learning approach to high-risk cardiac surgery risk scoring. J Card Surg. 2022;37(12):4612–4620. doi: 10.1111/jocs.17110. - DOI - PubMed
    1. Esteva A, et al. A guide to deep learning in healthcare. Nat Med. 2019;25(1):24–29. doi: 10.1038/s41591-018-0316-z. - DOI - PubMed
    1. Knoedler L, et al. Towards a reliable and rapid automated grading system in facial palsy patients: facial palsy surgery meets computer science. J Clin Med. 2022;11(17):4998. doi: 10.3390/jcm11174998. - DOI - PMC - PubMed

LinkOut - more resources