AI in clinical decision-making: ChatGPT-4 vs. Llama2 for otolaryngology cases

Antonino Maniaci^{1

2

3

4}, Cosima C Hoch^{5

6}, Lise Sogalow^{5

7}, Benedikt Schmidl⁶, Jerome R Lechien^{5

7

8}

Affiliations

¹ Department of Medical and Surgical Sciences, Faculty of Medicine, University of Enna Kore, Enna, Italy. Antonino.maniaci@unikore.it.
² Yoifos Research Committee, Paris, France. Antonino.maniaci@unikore.it.
³ Department of Human Anatomy and Experimental Oncology, Faculty of Medicine, UMONS Research Institute for Health Sciences and Technology, University of Mons, Mons, Belgium. Antonino.maniaci@unikore.it.
⁴ Department of Medicine and Surgery Faculty of Medicine, University of Enna Kore, Enna, 94100, Italy. Antonino.maniaci@unikore.it.
⁵ Yoifos Research Committee, Paris, France.
⁶ Department of Otolaryngology, Head and Neck Surgery, School of Medicine and Health, Technical University of Munich (TUM), 81675, Munich, Germany.
⁷ Department of Human Anatomy and Experimental Oncology, Faculty of Medicine, UMONS Research Institute for Health Sciences and Technology, University of Mons, Mons, Belgium.
⁸ Department of Otorhinolaryngology and Head and Neck Surgery, School of Medicine, Foch Hospital, UFR Simone Veil, Université Versailles Saint-Quentin-en-Yvelines (Paris Saclay University), Paris, France.

PMID: 40220179
DOI: 10.1007/s00405-025-09371-3

Comparative Study

AI in clinical decision-making: ChatGPT-4 vs. Llama2 for otolaryngology cases

Antonino Maniaci et al. Eur Arch Otorhinolaryngol. 2025 Jun.

. 2025 Jun;282(6):3293-3302.

doi: 10.1007/s00405-025-09371-3. Epub 2025 Apr 12.

Authors

Antonino Maniaci^{1

2

3

4}, Cosima C Hoch^{5

6}, Lise Sogalow^{5

7}, Benedikt Schmidl⁶, Jerome R Lechien^{5

7

8}

Affiliations

¹ Department of Medical and Surgical Sciences, Faculty of Medicine, University of Enna Kore, Enna, Italy. Antonino.maniaci@unikore.it.
² Yoifos Research Committee, Paris, France. Antonino.maniaci@unikore.it.
³ Department of Human Anatomy and Experimental Oncology, Faculty of Medicine, UMONS Research Institute for Health Sciences and Technology, University of Mons, Mons, Belgium. Antonino.maniaci@unikore.it.
⁴ Department of Medicine and Surgery Faculty of Medicine, University of Enna Kore, Enna, 94100, Italy. Antonino.maniaci@unikore.it.
⁵ Yoifos Research Committee, Paris, France.
⁶ Department of Otolaryngology, Head and Neck Surgery, School of Medicine and Health, Technical University of Munich (TUM), 81675, Munich, Germany.
⁷ Department of Human Anatomy and Experimental Oncology, Faculty of Medicine, UMONS Research Institute for Health Sciences and Technology, University of Mons, Mons, Belgium.
⁸ Department of Otorhinolaryngology and Head and Neck Surgery, School of Medicine, Foch Hospital, UFR Simone Veil, Université Versailles Saint-Quentin-en-Yvelines (Paris Saclay University), Paris, France.

PMID: 40220179
DOI: 10.1007/s00405-025-09371-3

Abstract

Purpose: To evaluate the diagnostic accuracy, appropriateness of additional examination recommendations, and consistency of therapeutic regimens by ChatGPT-4 and Llama2 based on real otolaryngology cases.

Methods: A prospective controlled study was conducted on 98 anonymized otolaryngology cases. Clinical information was entered in ChatGPT-4 and Llama2 for reaching primary diagnoses, additional examination recommendations, and treatment strategies. Two independent otolaryngologists evaluated the AI outputs using the artificial intelligence performance instrument (AIPI), evaluating diagnostic accuracy, appropriateness of examination, and adequacy of treatment. Statistical comparisons were conducted between the AI systems and expert decisions. Interrater reliability was evaluated with kappa statistics.

Results: ChatGPT-4 diagnosed 82% correctly, outperforming Llama2 at 76%. For additional examinations, ChatGPT-4 suggested relevant and appropriate tests in 88% of the studies, while Llama2 did so in 83%. Treatment appropriateness was achieved in 80% of the cases through ChatGPT-4 and 72% through Llama2. Sometimes, both systems suggested inappropriate tests. The interrater reliability was high for AIPI scores (kappa = 0.85).

Conclusion: ChatGPT-4 and Llama2 have shown great potential as clinical decision-support tools in otolaryngology, with ChatGPT-4 exhibiting superior performance. At the same time, non-relevant recommendations indicate further refinement and human oversight to ensure safe application in clinical practice.

Keywords: AI; Artificial intelligence; ChatGPT-4; Clinical decision making; Lama2.

PubMed Disclaimer

Conflict of interest statement

Declarations. Research involving human participants and/or animals: Human participants. Informed consent: Obtained for all the patients. Prior presentation: This work has not been previously presented at any meeting or conference. Conflicts of interest: The authors declare no conflicts of interest. The author Jerome R. Lechien was not involved with the peer review process of this article

References

1. Topol EJ (2019) High-performance medicine: the convergence of human and artificial intelligence. Nat Med 25(1):44–56 - DOI - PubMed
1. Esteva A, Robicquet A, Ramsundar B, Kuleshov V, DePristo M, Chou K et al (2019) A guide to deep learning in healthcare. Nat Med 25(1):24–29 - DOI - PubMed
1. Maniaci A, Chiesa-Estomba CM, Lechien JR (2024) ChatGPT-4 consistency in interpreting laryngeal clinical images of common lesions and disorders. Otolaryngol Head Neck Surg 171(4):1106–1113 - DOI - PubMed
1. Mira FA, Favier V, Dos Santos Sobreira Nunes H, de Castro JV, Carsuzaa F, Meccariello G et al (2024) Chat GPT for the management of obstructive sleep apnea: do we have a Polar star? Eur Arch Otorhinolaryngol 281(4):2087–2093 - DOI - PubMed
1. Tschandl P, Rinner C, Apalla Z, Argenziano G, Codella N, Halpern A et al (2020) Human–computer collaboration for skin cancer recognition. Nat Med 26(8):1229–1234 - DOI - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
- Springer
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

AI in clinical decision-making: ChatGPT-4 vs. Llama2 for otolaryngology cases

Affiliations

AI in clinical decision-making: ChatGPT-4 vs. Llama2 for otolaryngology cases

Authors

Affiliations

Abstract

Conflict of interest statement

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Miscellaneous