Performance and Consistency of ChatGPT-4 Versus Otolaryngologists: A Clinical Case Series
- PMID: 38591726
- DOI: 10.1002/ohn.759
Performance and Consistency of ChatGPT-4 Versus Otolaryngologists: A Clinical Case Series
Abstract
Objective: To study the performance of Chatbot Generative Pretrained Transformer-4 (ChatGPT-4) in the management of cases in otolaryngology-head and neck surgery.
Study design: Prospective case series.
Setting: Multicenter University Hospitals.
Methods: History, clinical, physical, and additional examinations of adult outpatients consulting in otolaryngology departments of CHU Saint-Pierre and Dour Medical Center were presented to ChatGPT-4, which was interrogated for differential diagnoses, management, and treatment(s). According to specialty, the ChatGPT-4 responses were assessed by 2 distinct, blinded board-certified otolaryngologists with the Artificial Intelligence Performance Instrument.
Results: One hundred cases were presented to ChatGPT-4. ChaGPT-4 indicated a mean of 3.34 (95% confidence interval [CI]: 3.09, 3.59) additional examinations per patient versus 2.10 (95% CI: 1.76, 2.34; P = .001) for the practitioners. There was strong consistency (k > 0.600) between otolaryngologists and ChatGPT-4 for the indication of upper aerodigestive tract endoscopy, positron emission tomography and computed tomography, audiometry, tympanometry, and psychophysical evaluations. Primary diagnosis was correctly performed by ChatGPT-4 in 38% to 86% of cases depending on subspecialty. Additional examinations indicated by ChatGPT-4 were pertinent and necessary in 8% to 31% of cases, while the treatment regimen was pertinent in 12% to 44% of cases. The performance of ChatGPT-4 was not influenced by the human-reported level of difficulty of clinical cases.
Conclusion: ChatGPT-4 may be a promising adjunctive tool in otolaryngology, providing extensive documentation about additional examinations, primary and differential diagnoses, and treatments. The ChatGPT-4 is more effective in providing a primary diagnosis, and less effective in the selection of additional examinations and treatments.
Keywords: ChatGPT‐4; artificial intelligence; head neck surgery; otolaryngology; performance.
© 2024 American Academy of Otolaryngology–Head and Neck Surgery Foundation.
Similar articles
-
ChatGPT performance in laryngology and head and neck surgery: a clinical case-series.Eur Arch Otorhinolaryngol. 2024 Jan;281(1):319-333. doi: 10.1007/s00405-023-08282-5. Epub 2023 Oct 24. Eur Arch Otorhinolaryngol. 2024. PMID: 37874336
-
ChatGPT-4 performance in rhinology: A clinical case series.Int Forum Allergy Rhinol. 2024 Jun;14(6):1123-1130. doi: 10.1002/alr.23323. Epub 2024 Jan 24. Int Forum Allergy Rhinol. 2024. PMID: 38268099
-
Applications of ChatGPT in Otolaryngology-Head Neck Surgery: A State of the Art Review.Otolaryngol Head Neck Surg. 2024 Sep;171(3):667-677. doi: 10.1002/ohn.807. Epub 2024 May 8. Otolaryngol Head Neck Surg. 2024. PMID: 38716790 Review.
-
ChatGPT-4 Consistency in Interpreting Laryngeal Clinical Images of Common Lesions and Disorders.Otolaryngol Head Neck Surg. 2024 Oct;171(4):1106-1113. doi: 10.1002/ohn.897. Epub 2024 Jul 24. Otolaryngol Head Neck Surg. 2024. PMID: 39045737
-
Generative AI and Otolaryngology-Head & Neck Surgery.Otolaryngol Clin North Am. 2024 Oct;57(5):753-765. doi: 10.1016/j.otc.2024.04.006. Epub 2024 Jun 4. Otolaryngol Clin North Am. 2024. PMID: 38839556 Review.
Cited by
-
AI in clinical decision-making: ChatGPT-4 vs. Llama2 for otolaryngology cases.Eur Arch Otorhinolaryngol. 2025 Jun;282(6):3293-3302. doi: 10.1007/s00405-025-09371-3. Epub 2025 Apr 12. Eur Arch Otorhinolaryngol. 2025. PMID: 40220179
-
Assessing ChatGPT's theoretical knowledge and prescriptive accuracy in bacterial infections: a comparative study with infectious diseases residents and specialists.Infection. 2025 Jun;53(3):873-881. doi: 10.1007/s15010-024-02350-6. Epub 2024 Jul 12. Infection. 2025. PMID: 38995551 Free PMC article.
-
Validation of the Quality Analysis of Medical Artificial Intelligence (QAMAI) tool: a new tool to assess the quality of health information provided by AI platforms.Eur Arch Otorhinolaryngol. 2024 Nov;281(11):6123-6131. doi: 10.1007/s00405-024-08710-0. Epub 2024 May 4. Eur Arch Otorhinolaryngol. 2024. PMID: 38703195 Free PMC article.
-
The role of ChatGPT-4o in differential diagnosis and management of vertigo-related disorders.Sci Rep. 2025 May 28;15(1):18688. doi: 10.1038/s41598-025-96309-8. Sci Rep. 2025. PMID: 40437044 Free PMC article.
-
Is ChatGPT 3.5 smarter than Otolaryngology trainees? A comparison study of board style exam questions.PLoS One. 2024 Sep 26;19(9):e0306233. doi: 10.1371/journal.pone.0306233. eCollection 2024. PLoS One. 2024. PMID: 39325705 Free PMC article.
References
-
- Vaira LA, Lechien JR, Abbate V, et al. Accuracy of ChatGPT‐generated information on head and neck and oromaxillofacial surgery: a multicenter collaborative analysis. Otolaryngol Head Neck Surg. 2023. doi:10.1002/ohn.489
-
- Lechien JR, Maniaci A, Gengler I, Hans S, Chiesa‐Estomba CM, Vaira LA. Validity and reliability of an instrument evaluating the performance of intelligent chatbot: the Artificial Intelligence Performance Instrument (AIPI). Eur Arch Otrhinolaryngol. 2023;281:2063‐2079. doi:10.1007/s00405-023-08219-y
-
- Lechien JR, Gorton A, Robertson J, Vaira LA. Is ChatGPT‐4 accurate in proofread a manuscript in otolaryngology‐head and neck surgery? Otolaryngol Head Neck Surg. 2023. doi:10.1002/ohn.526
-
- Lechien JR, Georgescu BM, Hans S, Chiesa‐Estomba CM. ChatGPT performance in laryngology and head and neck surgery: a clinical case‐series. Eur Arch Otrhinolaryngol. 2023;281:319‐333. doi:10.1007/s00405-023-08282-5
-
- von Elm E, Altman DG, Egger M, Pocock SJ, Gøtzsche PC, Vandenbroucke JP. The Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) statement: guidelines for reporting observational studies. J Clin Epidemiol. 2008;61(4):344‐349.
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources