Reliability of large language models for advanced head and neck malignancies management: a comparison between ChatGPT 4 and Gemini Advanced
- PMID: 38795148
- PMCID: PMC11392976
- DOI: 10.1007/s00405-024-08746-2
Reliability of large language models for advanced head and neck malignancies management: a comparison between ChatGPT 4 and Gemini Advanced
Abstract
Purpose: This study evaluates the efficacy of two advanced Large Language Models (LLMs), OpenAI's ChatGPT 4 and Google's Gemini Advanced, in providing treatment recommendations for head and neck oncology cases. The aim is to assess their utility in supporting multidisciplinary oncological evaluations and decision-making processes.
Methods: This comparative analysis examined the responses of ChatGPT 4 and Gemini Advanced to five hypothetical cases of head and neck cancer, each representing a different anatomical subsite. The responses were evaluated against the latest National Comprehensive Cancer Network (NCCN) guidelines by two blinded panels using the total disagreement score (TDS) and the artificial intelligence performance instrument (AIPI). Statistical assessments were performed using the Wilcoxon signed-rank test and the Friedman test.
Results: Both LLMs produced relevant treatment recommendations with ChatGPT 4 generally outperforming Gemini Advanced regarding adherence to guidelines and comprehensive treatment planning. ChatGPT 4 showed higher AIPI scores (median 3 [2-4]) compared to Gemini Advanced (median 2 [2-3]), indicating better overall performance. Notably, inconsistencies were observed in the management of induction chemotherapy and surgical decisions, such as neck dissection.
Conclusions: While both LLMs demonstrated the potential to aid in the multidisciplinary management of head and neck oncology, discrepancies in certain critical areas highlight the need for further refinement. The study supports the growing role of AI in enhancing clinical decision-making but also emphasizes the necessity for continuous updates and validation against current clinical standards to integrate AI into healthcare practices fully.
Keywords: Artificial intelligence; Computer-assisted diagnosis; Head and neck cancer; Head and neck oncology; Large language models; Laryngeal carcinoma; Nasopharyngeal carcinoma; Oncological diagnosis; Oropharyngeal carcinoma; Parotid carcinoma; Tongue carcinoma.
© 2024. The Author(s).
Conflict of interest statement
The authors have no potential conflict of interest or financial disclosures pertaining to this article.
Similar articles
-
Assessment of decision-making with locally run and web-based large language models versus human board recommendations in otorhinolaryngology, head and neck surgery.Eur Arch Otorhinolaryngol. 2025 Mar;282(3):1593-1607. doi: 10.1007/s00405-024-09153-3. Epub 2025 Jan 10. Eur Arch Otorhinolaryngol. 2025. PMID: 39792200 Free PMC article.
-
Evaluating the evidence-based potential of six large language models in paediatric dentistry: a comparative study on generative artificial intelligence.Eur Arch Paediatr Dent. 2025 Jun;26(3):527-535. doi: 10.1007/s40368-025-01012-x. Epub 2025 Feb 22. Eur Arch Paediatr Dent. 2025. PMID: 39987420 Free PMC article.
-
Comparative Analysis of ChatGPT-4o and Gemini Advanced Performance on Diagnostic Radiology In-Training Exams.Cureus. 2025 Mar 20;17(3):e80874. doi: 10.7759/cureus.80874. eCollection 2025 Mar. Cureus. 2025. PMID: 40255788 Free PMC article.
-
Artificial Intelligence for Anesthesiology Board-Style Examination Questions: Role of Large Language Models.J Cardiothorac Vasc Anesth. 2024 May;38(5):1251-1259. doi: 10.1053/j.jvca.2024.01.032. Epub 2024 Feb 1. J Cardiothorac Vasc Anesth. 2024. PMID: 38423884 Review.
-
Utility of artificial intelligence-based large language models in ophthalmic care.Ophthalmic Physiol Opt. 2024 May;44(3):641-671. doi: 10.1111/opo.13284. Epub 2024 Feb 25. Ophthalmic Physiol Opt. 2024. PMID: 38404172 Review.
Cited by
-
Evaluation of Multiple-Choice Tests in Head and Neck Ultrasound Created by Physicians and Large Language Models.Diagnostics (Basel). 2025 Jul 22;15(15):1848. doi: 10.3390/diagnostics15151848. Diagnostics (Basel). 2025. PMID: 40804813 Free PMC article.
-
Clinical decision support using large language models in otolaryngology: a systematic review.Eur Arch Otorhinolaryngol. 2025 Aug;282(8):4325-4334. doi: 10.1007/s00405-025-09504-8. Epub 2025 Jun 6. Eur Arch Otorhinolaryngol. 2025. PMID: 40481345
-
Is artificial intelligence true glory? Response to "Generative artificial intelligence in otolaryngology-head and neck surgery editorial: be an actor of the future or follower".Eur Arch Otorhinolaryngol. 2024 Jul;281(7):3867-3868. doi: 10.1007/s00405-024-08621-0. Epub 2024 Apr 2. Eur Arch Otorhinolaryngol. 2024. PMID: 38564012 No abstract available.
-
Assessing LLMs on IDSA Practice Guidelines for the Diagnosis and Treatment of Native Vertebral Osteomyelitis: A Comparison Study.J Clin Med. 2025 Jul 15;14(14):4996. doi: 10.3390/jcm14144996. J Clin Med. 2025. PMID: 40725688 Free PMC article.
-
Chat Generative Pre-Trained Transformer (ChatGPT) in Oral and Maxillofacial Surgery: A Narrative Review on Its Research Applications and Limitations.J Clin Med. 2025 Feb 18;14(4):1363. doi: 10.3390/jcm14041363. J Clin Med. 2025. PMID: 40004892 Free PMC article. Review.