Evaluation of the Performance of Large Language Models in the Management of Axial Spondyloarthropathy: Analysis of EULAR 2022 Recommendations
- PMID: 40564776
- PMCID: PMC12192445
- DOI: 10.3390/diagnostics15121455
Evaluation of the Performance of Large Language Models in the Management of Axial Spondyloarthropathy: Analysis of EULAR 2022 Recommendations
Abstract
Introduction: Guidelines have great importance in revealing complex and chronic conditions such as axial spondyloarthropathy. The aim of this study is to compare the answers given by various large language models to open-ended questions created from ASAS-EULAR 2022 guidance. Materials and Methods: This was a cross-sectional and comparative study. A total of 15 recommendations in the ASAS-EULAR 2022 guideline were derived directly from their content into open-ended questions in a clinical context. Each question was asked to the ChatGPT-3.5, GPT-4o, and Gemini 2.0 Flash models, and the answers were evaluated with a seven-point Likert system in terms of usability, reliability, Flesch-Kincaid Reading Ease (FKRE) and Flesch-Kincaid Grade Level (FKGL) metrics for readability, Universal Sentence Encoder (USE) and ROUGE-L for semantic and surface-level similarity. The results of different large language models were statistically compared, and p < 0.05 was revealed as statistically significant. Results: Better FKRE and FKGL scores were obtained in the Google Gemini 2.0 program (p < 0.05). Reliability and usefulness scores were significantly higher for ChatGPT-4o and Gemini 2.0 (p < 0.05). ChatGPT-4o yielded significantly higher semantic similarity scores compared to ChatGPT-3.5 (p < 0.05). There was a negative correlation between FKRE and FKGL scores and a positive correlation between reliability and usefulness scores (p < 0.05). Conclusions: It was determined that ChatGPT-4o and Gemini 2.0 programs gave more reliable, useful, and readable answers to open-ended questions derived from the ASAS-EULAR 2022 guidelines. These programs may potentially assist in supporting treatment decision-making in rheumatology in the future.
Keywords: artificial intelligence; guideline; rheumatic disease; spondylarthritis.
Conflict of interest statement
The authors declare that they have no conflicts of interest.
Similar articles
-
Assessment of readability, reliability, and quality of large language models in addressing frequently asked questions regarding prenatal screening for fetal chromosomal anomalies.Int J Gynaecol Obstet. 2025 Jul 1. doi: 10.1002/ijgo.70348. Online ahead of print. Int J Gynaecol Obstet. 2025. PMID: 40590442
-
A structured evaluation of LLM-generated step-by-step instructions in cadaveric brachial plexus dissection.BMC Med Educ. 2025 Jul 1;25(1):903. doi: 10.1186/s12909-025-07493-0. BMC Med Educ. 2025. PMID: 40598351 Free PMC article.
-
Artificial Intelligence in Peripheral Artery Disease Education: A Battle Between ChatGPT and Google Gemini.Cureus. 2025 Jun 1;17(6):e85174. doi: 10.7759/cureus.85174. eCollection 2025 Jun. Cureus. 2025. PMID: 40600083 Free PMC article.
-
What is the value of routinely testing full blood count, electrolytes and urea, and pulmonary function tests before elective surgery in patients with no apparent clinical indication and in subgroups of patients with common comorbidities: a systematic review of the clinical and cost-effective literature.Health Technol Assess. 2012 Dec;16(50):i-xvi, 1-159. doi: 10.3310/hta16500. Health Technol Assess. 2012. PMID: 23302507 Free PMC article.
-
Health professionals' experience of teamwork education in acute hospital settings: a systematic review of qualitative literature.JBI Database System Rev Implement Rep. 2016 Apr;14(4):96-137. doi: 10.11124/JBISRIR-2016-1843. JBI Database System Rev Implement Rep. 2016. PMID: 27532314
References
-
- Ramiro S., Nikiphorou E., Sepriano A., Ortolan A., Webers C., Baraliakos X., Landewé R.B.M., Van den Bosch F.E., Boteva B., Bremander A., et al. ASAS-EULAR recommendations for the management of axial spondyloarthritis: 2022 update. Ann. Rheum. Dis. 2023;82:19–34. doi: 10.1136/ard-2022-223296. - DOI - PubMed
-
- Ortolan A., Webers C., Sepriano A., Falzon L., Baraliakos X., Landewé R.B., Ramiro S., van der Heijde D., Nikiphorou E. Efficacy and safety of non-pharmacological and non-biological interventions: A systematic literature review informing the 2022 update of the ASAS/EULAR recommendations for the management of axial spondyloarthritis. Ann. Rheum. Dis. 2023;82:142–152. doi: 10.1136/ard-2022-223297. - DOI - PubMed
LinkOut - more resources
Full Text Sources