Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Apr;281(4):1835-1841.
doi: 10.1007/s00405-023-08372-4. Epub 2024 Jan 8.

Reliability of large language models in managing odontogenic sinusitis clinical scenarios: a preliminary multidisciplinary evaluation

Affiliations

Reliability of large language models in managing odontogenic sinusitis clinical scenarios: a preliminary multidisciplinary evaluation

Alberto Maria Saibene et al. Eur Arch Otorhinolaryngol. 2024 Apr.

Abstract

Purpose: This study aimed to evaluate the utility of large language model (LLM) artificial intelligence tools, Chat Generative Pre-Trained Transformer (ChatGPT) versions 3.5 and 4, in managing complex otolaryngological clinical scenarios, specifically for the multidisciplinary management of odontogenic sinusitis (ODS).

Methods: A prospective, structured multidisciplinary specialist evaluation was conducted using five ad hoc designed ODS-related clinical scenarios. LLM responses to these scenarios were critically reviewed by a multidisciplinary panel of eight specialist evaluators (2 ODS experts, 2 rhinologists, 2 general otolaryngologists, and 2 maxillofacial surgeons). Based on the level of disagreement from panel members, a Total Disagreement Score (TDS) was calculated for each LLM response, and TDS comparisons were made between ChatGPT3.5 and ChatGPT4, as well as between different evaluators.

Results: While disagreement to some degree was demonstrated in 73/80 evaluator reviews of LLMs' responses, TDSs were significantly lower for ChatGPT4 compared to ChatGPT3.5. Highest TDSs were found in the case of complicated ODS with orbital abscess, presumably due to increased case complexity with dental, rhinologic, and orbital factors affecting diagnostic and therapeutic options. There were no statistically significant differences in TDSs between evaluators' specialties, though ODS experts and maxillofacial surgeons tended to assign higher TDSs.

Conclusions: LLMs like ChatGPT, especially newer versions, showed potential for complimenting evidence-based clinical decision-making, but substantial disagreement was still demonstrated between LLMs and clinical specialists across most case examples, suggesting they are not yet optimal in aiding clinical management decisions. Future studies will be important to analyze LLMs' performance as they evolve over time.

Keywords: Artificial intelligence; Chronic rhinosinusitis; Computer-assisted diagnosis; Dental implant; Maxillary sinusitis; Oroantral fistula.

PubMed Disclaimer

Conflict of interest statement

The authors have no potential conflict of interest or financial disclosures pertaining to this article.

Figures

Fig. 1
Fig. 1
Box and whiskers plot showing the distribution of total disagreement scores (TDS) according to the subspecialty of evaluators (ENT non-rhinology otolaryngologists, MXF maxillofacial surgeons, ODS odontogenic sinusitis specialists, RHINO rhinologists)

Similar articles

Cited by

References

    1. Liu S, Wright AP, Patterson BL, et al. Using AI-generated suggestions from ChatGPT to optimize clinical decision support. J Am Med Inform Assoc. 2023;30:1237–1245. doi: 10.1093/jamia/ocad072. - DOI - PMC - PubMed
    1. Chiesa-Estomba CM, Lechien JR, Vaira LA, et al. Exploring the potential of Chat-GPT as a supportive tool for sialendoscopy clinical decision making and patient information support. Eur Arch Otorhinolaryngol. 2023 doi: 10.1007/s00405-023-08104-8. - DOI - PubMed
    1. Saibene AM, Pipolo C, Borloni R, et al. ENT and dentist cooperation in the management of odontogenic sinusitis. A review. Acta Otorhinolaryngol Ital. 2021;41:S116–S123. doi: 10.14639/0392-100x-suppl.1-41-2021-12. - DOI - PMC - PubMed
    1. Allevi F, Fadda GL, Rosso C, et al. Diagnostic criteria for odontogenic sinusitis: a systematic review. Am J Rhinol Allergy. 2021;35:713–721. doi: 10.1177/1945892420976766. - DOI - PubMed
    1. Craig JR, Saibene AM, Felisati G. Chronic odontogenic rhinosinusitis: optimization of surgical treatment indications. Am J Rhinol Allergy. 2021;35:142–143. doi: 10.1177/1945892420965474. - DOI - PubMed