Performance and Consistency of ChatGPT-4 Versus Otolaryngologists: A Clinical Case Series

Jérôme R Lechien^{1

2

3

4}, Mattheuw R Naunheim^{1

5}, Antonino Maniaci^{1

6}, Thomas Radulesco^{1

7}, Alberto M Saibene^{1

8}, Carlos M Chiesa-Estomba^{1

9}, Luigi A Vaira^{1

10

11}

Affiliations

¹ Research Committee of Young Otolaryngologists of the International Federation of Otorhinolaryngological Societies (IFOS), Paris, France.
² Division of Laryngology and Broncho-Esophagology, Department of Otolaryngology-Head Neck Surgery, EpiCURA Hospital, UMONS Research Institute for Health Sciences and Technology, University of Mons (UMons), Mons, Belgium.
³ Department of Otorhinolaryngology and Head and Neck Surgery, Foch Hospital, Phonetics and Phonology Laboratory (UMR 7018 CNRS, Université Sorbonne Nouvelle/Paris 3), Paris Saclay University, Paris, France.
⁴ Department of Otorhinolaryngology and Head and Neck Surgery, CHU Saint-Pierre, Brussels, Belgium.
⁵ Department of Otolaryngology, Massachusetts Eye and Ear, Harvard Medical School, Boston, Massachusetts, USA.
⁶ Department of medicine and surgery, Faculty of Medicine and Surgery, University of Enna "Kore", Enna, Italy.
⁷ ENT-HNS Department, APHM, CNRS, IUSTI, La Conception University Hospital, Aix Marseille Univ, Marseille, France.
⁸ Otolaryngology Unit, Department of Health Sciences, ASST Santi Paolo E Carlo, Università Degli Studi Di Milano, Milan, Italy.
⁹ Department of Otorhinolaryngology-Head and Neck Surgery, Hospital Universitario Donostia, San Sebastian, Spain.
¹⁰ Maxillofacial Surgery Operative Unit, Department of Medicine, Surgery and Pharmacy, University of Sassari, Sassari, Italy.
¹¹ Department of Biomedical Sciences, PhD School of Biomedical Sciences, University of Sassari, Sassari, Italy.

PMID: 38591726
DOI: 10.1002/ohn.759

Comparative Study

Performance and Consistency of ChatGPT-4 Versus Otolaryngologists: A Clinical Case Series

Jérôme R Lechien et al. Otolaryngol Head Neck Surg. 2024 Jun.

. 2024 Jun;170(6):1519-1526.

doi: 10.1002/ohn.759. Epub 2024 Apr 9.

Authors

Jérôme R Lechien^{1

2

3

4}, Mattheuw R Naunheim^{1

5}, Antonino Maniaci^{1

6}, Thomas Radulesco^{1

7}, Alberto M Saibene^{1

8}, Carlos M Chiesa-Estomba^{1

9}, Luigi A Vaira^{1

10

11}

Affiliations

¹ Research Committee of Young Otolaryngologists of the International Federation of Otorhinolaryngological Societies (IFOS), Paris, France.
² Division of Laryngology and Broncho-Esophagology, Department of Otolaryngology-Head Neck Surgery, EpiCURA Hospital, UMONS Research Institute for Health Sciences and Technology, University of Mons (UMons), Mons, Belgium.
³ Department of Otorhinolaryngology and Head and Neck Surgery, Foch Hospital, Phonetics and Phonology Laboratory (UMR 7018 CNRS, Université Sorbonne Nouvelle/Paris 3), Paris Saclay University, Paris, France.
⁴ Department of Otorhinolaryngology and Head and Neck Surgery, CHU Saint-Pierre, Brussels, Belgium.
⁵ Department of Otolaryngology, Massachusetts Eye and Ear, Harvard Medical School, Boston, Massachusetts, USA.
⁶ Department of medicine and surgery, Faculty of Medicine and Surgery, University of Enna "Kore", Enna, Italy.
⁷ ENT-HNS Department, APHM, CNRS, IUSTI, La Conception University Hospital, Aix Marseille Univ, Marseille, France.
⁸ Otolaryngology Unit, Department of Health Sciences, ASST Santi Paolo E Carlo, Università Degli Studi Di Milano, Milan, Italy.
⁹ Department of Otorhinolaryngology-Head and Neck Surgery, Hospital Universitario Donostia, San Sebastian, Spain.
¹⁰ Maxillofacial Surgery Operative Unit, Department of Medicine, Surgery and Pharmacy, University of Sassari, Sassari, Italy.
¹¹ Department of Biomedical Sciences, PhD School of Biomedical Sciences, University of Sassari, Sassari, Italy.

PMID: 38591726
DOI: 10.1002/ohn.759

Abstract

Objective: To study the performance of Chatbot Generative Pretrained Transformer-4 (ChatGPT-4) in the management of cases in otolaryngology-head and neck surgery.

Study design: Prospective case series.

Setting: Multicenter University Hospitals.

Methods: History, clinical, physical, and additional examinations of adult outpatients consulting in otolaryngology departments of CHU Saint-Pierre and Dour Medical Center were presented to ChatGPT-4, which was interrogated for differential diagnoses, management, and treatment(s). According to specialty, the ChatGPT-4 responses were assessed by 2 distinct, blinded board-certified otolaryngologists with the Artificial Intelligence Performance Instrument.

Results: One hundred cases were presented to ChatGPT-4. ChaGPT-4 indicated a mean of 3.34 (95% confidence interval [CI]: 3.09, 3.59) additional examinations per patient versus 2.10 (95% CI: 1.76, 2.34; P = .001) for the practitioners. There was strong consistency (k > 0.600) between otolaryngologists and ChatGPT-4 for the indication of upper aerodigestive tract endoscopy, positron emission tomography and computed tomography, audiometry, tympanometry, and psychophysical evaluations. Primary diagnosis was correctly performed by ChatGPT-4 in 38% to 86% of cases depending on subspecialty. Additional examinations indicated by ChatGPT-4 were pertinent and necessary in 8% to 31% of cases, while the treatment regimen was pertinent in 12% to 44% of cases. The performance of ChatGPT-4 was not influenced by the human-reported level of difficulty of clinical cases.

Conclusion: ChatGPT-4 may be a promising adjunctive tool in otolaryngology, providing extensive documentation about additional examinations, primary and differential diagnoses, and treatments. The ChatGPT-4 is more effective in providing a primary diagnosis, and less effective in the selection of additional examinations and treatments.

Keywords: ChatGPT‐4; artificial intelligence; head neck surgery; otolaryngology; performance.

PubMed Disclaimer

References

1. Vaira LA, Lechien JR, Abbate V, et al. Accuracy of ChatGPT‐generated information on head and neck and oromaxillofacial surgery: a multicenter collaborative analysis. Otolaryngol Head Neck Surg. 2023. doi:10.1002/ohn.489
1. Lechien JR, Maniaci A, Gengler I, Hans S, Chiesa‐Estomba CM, Vaira LA. Validity and reliability of an instrument evaluating the performance of intelligent chatbot: the Artificial Intelligence Performance Instrument (AIPI). Eur Arch Otrhinolaryngol. 2023;281:2063‐2079. doi:10.1007/s00405-023-08219-y
1. Lechien JR, Gorton A, Robertson J, Vaira LA. Is ChatGPT‐4 accurate in proofread a manuscript in otolaryngology‐head and neck surgery? Otolaryngol Head Neck Surg. 2023. doi:10.1002/ohn.526
1. Lechien JR, Georgescu BM, Hans S, Chiesa‐Estomba CM. ChatGPT performance in laryngology and head and neck surgery: a clinical case‐series. Eur Arch Otrhinolaryngol. 2023;281:319‐333. doi:10.1007/s00405-023-08282-5
1. von Elm E, Altman DG, Egger M, Pocock SJ, Gøtzsche PC, Vandenbroucke JP. The Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) statement: guidelines for reporting observational studies. J Clin Epidemiol. 2008;61(4):344‐349.

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
- Ovid Technologies, Inc.
- Wiley

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Performance and Consistency of ChatGPT-4 Versus Otolaryngologists: A Clinical Case Series

Affiliations

Performance and Consistency of ChatGPT-4 Versus Otolaryngologists: A Clinical Case Series

Authors

Affiliations

Abstract

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources