. 2025 Nov 19:12:1670824.

doi: 10.3389/fmed.2025.1670824. eCollection 2025.

Clinical applications of large language models in knee osteoarthritis: a systematic review

Zebing Ma^#¹, Yibing Liu^#¹, Ziyan Zhang^#², Rui Chen³, Huayu Fan³, Xiangyang Cao^{3

4

5

6}, Lili Ni⁷

Affiliations

¹ Hunan University of Chinese Medicine, Changsha, Hunan, China.
² Central South University, Changsha, Hunan, China.
³ Luoyang Orthopedic Hospital of Henan Province (Orthopedic Hospital of Henan Province), Zhengzhou, China.
⁴ Institute of Intelligent Medical and Bioengineering Henan Academy of Traditional Chinese Medicine Sciences, Zhengzhou, China.
⁵ Henan Province Artificial Intelligence Engineering Research Center for Bone Injury Rehabilitation, Zhengzhou, China.
⁶ Henan University of Chinese Medicine, Zhengzhou, Henan, China.
⁷ The Second Affiliated Hospital of Hunan University of Chinese Medicine, Changsha, Hunan, China.

^# Contributed equally.

PMID: 41346991
PMCID: PMC12672416
DOI: 10.3389/fmed.2025.1670824

Clinical applications of large language models in knee osteoarthritis: a systematic review

Zebing Ma et al. Front Med (Lausanne). 2025.

. 2025 Nov 19:12:1670824.

doi: 10.3389/fmed.2025.1670824. eCollection 2025.

Authors

Zebing Ma^#¹, Yibing Liu^#¹, Ziyan Zhang^#², Rui Chen³, Huayu Fan³, Xiangyang Cao^{3

4

5

6}, Lili Ni⁷

Affiliations

¹ Hunan University of Chinese Medicine, Changsha, Hunan, China.
² Central South University, Changsha, Hunan, China.
³ Luoyang Orthopedic Hospital of Henan Province (Orthopedic Hospital of Henan Province), Zhengzhou, China.
⁴ Institute of Intelligent Medical and Bioengineering Henan Academy of Traditional Chinese Medicine Sciences, Zhengzhou, China.
⁵ Henan Province Artificial Intelligence Engineering Research Center for Bone Injury Rehabilitation, Zhengzhou, China.
⁶ Henan University of Chinese Medicine, Zhengzhou, Henan, China.
⁷ The Second Affiliated Hospital of Hunan University of Chinese Medicine, Changsha, Hunan, China.

^# Contributed equally.

PMID: 41346991
PMCID: PMC12672416
DOI: 10.3389/fmed.2025.1670824

Abstract

Background and aims: Knee osteoarthritis (KOA) is a common chronic degenerative disease that significantly impacts patients' quality of life. With the rapid advancement of artificial intelligence, large language models (LLMs) have demonstrated potential in supporting medical information extraction, clinical decision-making, and patient education through their natural language processing capabilities. However, the current landscape of LLM applications in the KOA domain, along with their methodological quality, has yet to be systematically reviewed. Therefore, this systematic review aims to comprehensively summarize existing clinical studies on LLMs in KOA, evaluate their performance and methodological rigor, and identify current challenges and future research directions.

Methods: Following the PRISMA guidelines, a systematic search was conducted in PubMed, Cochrane Library, Embase databases and Web of science for literature published up to June 2025. The protocol was preregistered on the OSF platform. Studies were screened using standardized inclusion and exclusion criteria. Key study characteristics and performance evaluation metrics were extracted. Methodological quality was assessed using tools such as Cochrane RoB, STROBE, STARD, and DISCERN. Additionally, the CLEAR-LLM and CliMA-10 frameworks were applied to provide complementary evaluations of quality and performance.

Results: A total of 16 studies were included, covering various LLMs such as ChatGPT, Gemini, and Claude. Application scenarios encompassed text generation, imaging diagnostics, and patient education. Most studies were observational in nature, and overall methodological quality ranged from moderate to high. Based on CliMA-10 scores, LLMs exhibited upper-moderate performance in KOA-related tasks. The ChatGPT-4 series consistently outperformed other models, especially in structured output generation, interpretation of clinical terminology, and content accuracy. Key limitations included insufficient sample representativeness, inconsistent control over hallucinated content, and the lack of standardized evaluation tools.

Conclusion: Large language models show notable potential in the KOA field, but their clinical application is still exploratory and limited by issues such as sample bias and methodological heterogeneity. Model performance varies across tasks, underscoring the need for improved prompt design and standardized evaluation frameworks. With real-world data and ethical oversight, LLMs may contribute more significantly to personalized KOA management.

Systematic review registration: https://osf.io/jy4kz, identifier 10.17605/OSF.IO/479R8.

Keywords: ChatGPT; artificial intelligence; clinical decision support; knee osteoarthritis; large language models; systematic review.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

**FIGURE 1**
Flowchart of the study selection process.

**FIGURE 2**
Risk assessment of CLEAR-LLM. R_1–16, research 1–16; A, clarity of research objectives; B, control design; C, data sources and transparency; D, model description; E, prompt design; F, role of human evaluators; G, output evaluation and quantification; H, patient-relevance indicators; I, sample size and representativeness; J, bias control; K, ethical considerations; L, discussion of limitations.

**FIGURE 3**
Performance heatmap of LLMs in KOA in current research. Higher values indicate better performance. R_1–16, research 1–16; a, accuracy of medical content; b, contextual coherence; c, interpretability of medical terminology; d, clinical usefulness; e, hallucination control; f, safety and ethical compliance; g, structured output; h–j, flexible dimensions (detailed in Supplementary Materials); k, overall composite score. ChatGPT-4* , LLM fine-tuned based on KOA knowledge.

See this image and copyright information in PMC

References

1. GBD 2021 Osteoarthritis Collaborators. Global, regional, and national burden of osteoarthritis, 1990-2020 and projections to 2050: a systematic analysis for the Global Burden of Disease Study 2021. Lancet Rheumatol. (2023). 5:e508–22. 10.1016/S2665-9913(23)00163-7 - DOI - PMC - PubMed
1. Sharma L. Osteoarthritis of the Knee. N Engl J Med. (2021) 384:51–9. 10.1056/NEJMcp1903768 - DOI - PubMed
1. Shah NH, Entwistle D, Pfeffer MA. Creation and adoption of large language models in medicine. JAMA. (2023) 330:866–9. 10.1001/jama.2023.14217 - DOI - PubMed
1. Sterne JAC, Savović J, Page MJ, Elbers RG, Blencowe NS, Boutron I, et al. RoB 2: a revised tool for assessing risk of bias in randomised trials. BMJ. (2019) 366:l4898. 10.1136/bmj.l4898 - DOI - PubMed
1. Bossuyt PM, Reitsma JB, Bruns DE, Gatsonis CA, Glasziou PP, Irwig L, et al. STARD 2015: an updated list of essential items for reporting diagnostic accuracy studies. BMJ. (2015) 351:h5527. 10.1136/bmj.h5527 - DOI - PMC - PubMed

Publication types

Actions

LinkOut - more resources

Full Text Sources
- Frontiers Media SA
- PubMed Central

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Clinical applications of large language models in knee osteoarthritis: a systematic review

Affiliations

Clinical applications of large language models in knee osteoarthritis: a systematic review

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

LinkOut - more resources

Full Text Sources