Clinical Management of Wasp Stings Using Large Language Models: Cross-Sectional Evaluation Study
- PMID: 40466102
- PMCID: PMC12177424
- DOI: 10.2196/67489
Clinical Management of Wasp Stings Using Large Language Models: Cross-Sectional Evaluation Study
Abstract
Background: Wasp stings are a significant public health concern in many parts of the world, particularly in tropical and subtropical regions. The venom of wasps contains a variety of bioactive compounds that can lead to a wide range of clinical effects, from mild localized pain and swelling to severe, life-threatening allergic reactions, such as anaphylaxis. With the rapid development of artificial intelligence (AI) technologies, large language models (LLMs) are increasingly being used in health care, including emergency medicine and toxicology. These models have the potential to assist health care professionals in making fast and informed clinical decisions. This study aimed to assess the performance of 4 leading LLMs-ERNIE Bot 3.5 (Baidu), ERNIE Bot 4.0 (Baidu), Claude Pro (Anthropic), and ChatGPT 4.0-in managing wasp sting cases, with a focus on their accuracy, comprehensiveness, and decision-making abilities.
Objective: The objective of this research was to systematically evaluate and compare the capabilities of the 4 LLMs in the context of wasp sting management. This involved analyzing their responses to a series of standardized questions and real-world clinical scenarios. The study aimed to determine which LLMs provided the most accurate, complete, and clinically relevant information for the management of wasp stings.
Methods: This study used a cross-sectional design, creating 50 standardized questions that covered 10 key domains in the management of wasp stings, along with 20 real-world clinical case scenarios. Responses from the 4 LLMs were independently evaluated by 8 domain experts, who rated them on a 5-point Likert scale based on accuracy, completeness, and usefulness in clinical decision-making. Statistical comparisons between the models were made using the Wilcoxon signed-rank test, and the consistency of expert ratings was assessed using the Kendall coefficient of concordance.
Results: Claude Pro achieved the highest average score of 4.7 (SD 0.603) out of 5, followed closely by ChatGPT 4.0 with a score of 4.5. ERNIE Bot 4.0 and ERNIE Bot 3.5 received average scores of 4 (SD 0.600) and 3.8, respectively. In analyzing the 20 complex clinical cases, Claude Pro significantly outperformed ERNIE Bot 3.5, particularly in areas such as managing complications and assessing the severity of reactions (P<.001). The expert ratings showed moderate agreement (Kendall W=0.67), indicating that the assessments were consistently reliable.
Conclusions: The results of this study suggest that Claude Pro and ChatGPT 4.0 are highly capable of providing accurate and comprehensive support for the clinical management of wasp stings, particularly in complex decision-making scenarios. These findings support the increasing role of AI in emergency and toxicological medicine and suggest that the choice of AI tool should be based on the specific needs of the clinical situation, ensuring that the most appropriate model is selected for different health care applications.
Keywords: artificial intelligence; decision support; emergency medicine; hymenoptera envenomation; natural language processing.
©Wei Pan, Shuman Zhang, Yonghong Wang, Zhenglin Quan, Yanxia Zhu, Zhicheng Fang, Xianyi Yang. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 04.06.2025.
Conflict of interest statement
Conflicts of Interest: None declared.
Similar articles
-
Application of Large Language Models in Stroke Rehabilitation Health Education: 2-Phase Study.J Med Internet Res. 2025 Jul 22;27:e73226. doi: 10.2196/73226. J Med Internet Res. 2025. PMID: 40694436 Free PMC article.
-
Classifying Patient Complaints Using Artificial Intelligence-Powered Large Language Models: Cross-Sectional Study.J Med Internet Res. 2025 Aug 6;27:e74231. doi: 10.2196/74231. J Med Internet Res. 2025. PMID: 40768757 Free PMC article.
-
Comparison of ChatGPT and Internet Research for Clinical Research and Decision-Making in Occupational Medicine: Randomized Controlled Trial.JMIR Form Res. 2025 May 20;9:e63857. doi: 10.2196/63857. JMIR Form Res. 2025. PMID: 40393042 Free PMC article. Clinical Trial.
-
Examining the Role of Large Language Models in Orthopedics: Systematic Review.J Med Internet Res. 2024 Nov 15;26:e59607. doi: 10.2196/59607. J Med Internet Res. 2024. PMID: 39546795 Free PMC article.
-
Assessing the comparative effects of interventions in COPD: a tutorial on network meta-analysis for clinicians.Respir Res. 2024 Dec 21;25(1):438. doi: 10.1186/s12931-024-03056-x. Respir Res. 2024. PMID: 39709425 Free PMC article. Review.
References
-
- Wehbe R, Frangieh J, Rima M, El Obeid D, Sabatier JM, Fajloun Z. Bee Venom: Overview of main compounds and bioactivities for therapeutic interests. Molecules. 2019;24(16):2997. doi: 10.3390/molecules24162997. https://www.mdpi.com/resolver?pii=molecules24162997 molecules24162997 - DOI - PMC - PubMed
-
- Feás X, Vidal C, Remesar S. What we know about sting-related deaths? Human fatalities caused by hornet, wasp and bee stings in Europe (1994-2016) Biology (Basel) 2022;11(2):282. doi: 10.3390/biology11020282. https://www.mdpi.com/resolver?pii=biology11020282 biology11020282 - DOI - PMC - PubMed
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Medical
Research Materials