Use of artificial intelligence chatbots in clinical management of immune-related adverse events
- PMID: 38816231
- PMCID: PMC11141185
- DOI: 10.1136/jitc-2023-008599
Use of artificial intelligence chatbots in clinical management of immune-related adverse events
Abstract
Background: Artificial intelligence (AI) chatbots have become a major source of general and medical information, though their accuracy and completeness are still being assessed. Their utility to answer questions surrounding immune-related adverse events (irAEs), common and potentially dangerous toxicities from cancer immunotherapy, are not well defined.
Methods: We developed 50 distinct questions with answers in available guidelines surrounding 10 irAE categories and queried two AI chatbots (ChatGPT and Bard), along with an additional 20 patient-specific scenarios. Experts in irAE management scored answers for accuracy and completion using a Likert scale ranging from 1 (least accurate/complete) to 4 (most accurate/complete). Answers across categories and across engines were compared.
Results: Overall, both engines scored highly for accuracy (mean scores for ChatGPT and Bard were 3.87 vs 3.5, p<0.01) and completeness (3.83 vs 3.46, p<0.01). Scores of 1-2 (completely or mostly inaccurate or incomplete) were particularly rare for ChatGPT (6/800 answer-ratings, 0.75%). Of the 50 questions, all eight physician raters gave ChatGPT a rating of 4 (fully accurate or complete) for 22 questions (for accuracy) and 16 questions (for completeness). In the 20 patient scenarios, the average accuracy score was 3.725 (median 4) and the average completeness was 3.61 (median 4).
Conclusions: AI chatbots provided largely accurate and complete information regarding irAEs, and wildly inaccurate information ("hallucinations") was uncommon. However, until accuracy and completeness increases further, appropriate guidelines remain the gold standard to follow.
Keywords: Colitis; Immune Checkpoint Inhibitor; Immune related adverse event - irAE; Pneumonitis; Thyroiditis.
© Author(s) (or their employer(s)) 2024. Re-use permitted under CC BY-NC. No commercial re-use. See rights and permissions. Published by BMJ.
Conflict of interest statement
Competing interests: AP reports grants and personal fees from Bristol-Myers Squibb; personal fees from AstraZeneca, Pfizer, Merck, Roche, and Canadian Agency for Drugs and Technologies in Health; and grants from Alberta Cancer Foundation outside the submitted work. IP has served on advisory boards for Nektar, Iovance, Nouscom, I-O Bio and has stock ownership in Ideaya. DBJ has served on advisory boards or as a consultant for BMS, Catalyst Biopharma, Iovance, Mallinckrodt, Merck, Mosaic ImmunoEngineering, Novartis, Pfizer, Targovax, and Teiko, has received research funding from BMS and Incyte, and has patents pending for use of MHC-II as a biomarker for immune checkpoint inhibitor response, and abatacept as treatment for immune-related adverse events. DEG reports research funding from AstraZeneca, BerGenBio, Karyopharm, and Novocure; stock ownership in Gilead; service on advisory boards or consulting for AstraZeneca, Catalyst Pharmaceuticals, Daiichi-Sankyo, Elevation Oncology, Janssen Scientific Affairs, LLC, Jazz Pharmaceuticals, Regeneron Pharmaceuticals, Sanofi; U.S. patent 11,747,345, patent applications 17/045,482, 63/386,387, 63/382,972, 63/382,257; and is Co-founder and Chief Scientific Officer of OncoSeer Diagnostics, LLC.
Similar articles
-
Assessing the Accuracy of Information on Medication Abortion: A Comparative Analysis of ChatGPT and Google Bard AI.Cureus. 2024 Jan 2;16(1):e51544. doi: 10.7759/cureus.51544. eCollection 2024 Jan. Cureus. 2024. PMID: 38318564 Free PMC article.
-
Performance of ChatGPT-4 and Bard chatbots in responding to common patient questions on prostate cancer 177Lu-PSMA-617 therapy.Front Oncol. 2024 Jul 12;14:1386718. doi: 10.3389/fonc.2024.1386718. eCollection 2024. Front Oncol. 2024. PMID: 39070149 Free PMC article.
-
The performance of artificial intelligence chatbot large language models to address skeletal biology and bone health queries.J Bone Miner Res. 2024 Mar 22;39(2):106-115. doi: 10.1093/jbmr/zjad007. J Bone Miner Res. 2024. PMID: 38477743 Free PMC article.
-
Management of Immune Checkpoint Inhibitor Toxicities: A Review and Clinical Guideline for Emergency Physicians.J Emerg Med. 2018 Oct;55(4):489-502. doi: 10.1016/j.jemermed.2018.07.005. Epub 2018 Aug 16. J Emerg Med. 2018. PMID: 30120013 Review.
-
The Inconsistent and Inadequate Reporting Of Immune-Related Adverse Events in PD-1/PD-L1 Inhibitors: A Systematic Review of Randomized Controlled Clinical Trials.Oncologist. 2021 Dec;26(12):e2239-e2246. doi: 10.1002/onco.13940. Epub 2021 Aug 31. Oncologist. 2021. PMID: 34396642 Free PMC article.
Cited by
-
Evaluating Large Language Models for Preoperative Patient Education in Superior Capsular Reconstruction: Comparative Study of Claude, GPT, and Gemini.JMIR Perioper Med. 2025 Jun 12;8:e70047. doi: 10.2196/70047. JMIR Perioper Med. 2025. PMID: 40505086 Free PMC article.
-
Understanding AI's Role in Endometriosis Patient Education and Evaluating Its Information and Accuracy: Systematic Review.JMIR AI. 2024 Oct 30;3:e64593. doi: 10.2196/64593. JMIR AI. 2024. PMID: 39476855 Free PMC article. Review.
-
Use of artificial intelligence chatbots in clinical management of immune-related adverse events.J Immunother Cancer. 2024 Dec 4;12(12):e009999. doi: 10.1136/jitc-2024-009999. J Immunother Cancer. 2024. PMID: 39631849 Free PMC article. No abstract available.
-
Clinical Management of Wasp Stings Using Large Language Models: Cross-Sectional Evaluation Study.J Med Internet Res. 2025 Jun 4;27:e67489. doi: 10.2196/67489. J Med Internet Res. 2025. PMID: 40466102 Free PMC article.
-
Exploring the capabilities of GenAI for oral cancer consultations in remote consultations : Author.BMC Oral Health. 2025 Feb 20;25(1):269. doi: 10.1186/s12903-025-05619-w. BMC Oral Health. 2025. PMID: 39979918 Free PMC article.
References
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources