Evaluation of the Performance of Large Language Models in the Management of Axial Spondyloarthropathy: Analysis of EULAR 2022 Recommendations
- PMID: 40564776
- PMCID: PMC12192445
- DOI: 10.3390/diagnostics15121455
Evaluation of the Performance of Large Language Models in the Management of Axial Spondyloarthropathy: Analysis of EULAR 2022 Recommendations
Abstract
Introduction: Guidelines have great importance in revealing complex and chronic conditions such as axial spondyloarthropathy. The aim of this study is to compare the answers given by various large language models to open-ended questions created from ASAS-EULAR 2022 guidance. Materials and Methods: This was a cross-sectional and comparative study. A total of 15 recommendations in the ASAS-EULAR 2022 guideline were derived directly from their content into open-ended questions in a clinical context. Each question was asked to the ChatGPT-3.5, GPT-4o, and Gemini 2.0 Flash models, and the answers were evaluated with a seven-point Likert system in terms of usability, reliability, Flesch-Kincaid Reading Ease (FKRE) and Flesch-Kincaid Grade Level (FKGL) metrics for readability, Universal Sentence Encoder (USE) and ROUGE-L for semantic and surface-level similarity. The results of different large language models were statistically compared, and p < 0.05 was revealed as statistically significant. Results: Better FKRE and FKGL scores were obtained in the Google Gemini 2.0 program (p < 0.05). Reliability and usefulness scores were significantly higher for ChatGPT-4o and Gemini 2.0 (p < 0.05). ChatGPT-4o yielded significantly higher semantic similarity scores compared to ChatGPT-3.5 (p < 0.05). There was a negative correlation between FKRE and FKGL scores and a positive correlation between reliability and usefulness scores (p < 0.05). Conclusions: It was determined that ChatGPT-4o and Gemini 2.0 programs gave more reliable, useful, and readable answers to open-ended questions derived from the ASAS-EULAR 2022 guidelines. These programs may potentially assist in supporting treatment decision-making in rheumatology in the future.
Keywords: artificial intelligence; guideline; rheumatic disease; spondylarthritis.
Conflict of interest statement
The authors declare that they have no conflicts of interest.
References
-
- Ramiro S., Nikiphorou E., Sepriano A., Ortolan A., Webers C., Baraliakos X., Landewé R.B.M., Van den Bosch F.E., Boteva B., Bremander A., et al. ASAS-EULAR recommendations for the management of axial spondyloarthritis: 2022 update. Ann. Rheum. Dis. 2023;82:19–34. doi: 10.1136/ard-2022-223296. - DOI - PubMed
-
- Ortolan A., Webers C., Sepriano A., Falzon L., Baraliakos X., Landewé R.B., Ramiro S., van der Heijde D., Nikiphorou E. Efficacy and safety of non-pharmacological and non-biological interventions: A systematic literature review informing the 2022 update of the ASAS/EULAR recommendations for the management of axial spondyloarthritis. Ann. Rheum. Dis. 2023;82:142–152. doi: 10.1136/ard-2022-223297. - DOI - PubMed
LinkOut - more resources
Full Text Sources
