. 2025 Jul 22:27:e73226.

doi: 10.2196/73226.

Application of Large Language Models in Stroke Rehabilitation Health Education: 2-Phase Study

Shiqi Qiang^#¹, Haitao Zhang^#², Yang Liao^#³, Yue Zhang⁴, Yanfen Gu⁵, Yiyan Wang³, Zehui Xu⁶, Hui Shi⁷, Nuo Han⁸, Haiping Yu⁹

Affiliations

¹ Shanghai East Hospital, School of Medicine, Tongji University, Shanghai, China.
² Department of Emergency and Critical Care, Shanghai East Hospital, School of Medicine, Tongji University, Shanghai, China.
³ Neurological Rehabilitation Center, Shanghai Sunshine Rehabilitation Center, School of Medicine, Tongji University, Shanghai, China.
⁴ Department of Neurology, Shanghai East Hospital, School of Medicine, Tongji University, Shanghai, China.
⁵ Department of Gastrointestinal Endoscopy, Shanghai East Hospital, School of Medicine, Tongji University, Shanghai, China.
⁶ Department of Breast Diseases, Yueyang Hospital of Integrated Traditional Chinese and Western Medicine, Shanghai University of Traditional Chinese Medicine, Shanghai, China.
⁷ Department of Nursing, Zhongshan Hospital, Fudan University, Shanghai, China.
⁸ School of Acupuncture-Moxibustion and Tuina, Shanghai University of Traditional Chinese Medicine, Shanghai, China.
⁹ Department of Nursing, Shanghai East Hospital, School of Medicine, Tongji University, No. 1800 Yuntai Road, Shanghai, 200120, China, 86 18964538997

^# Contributed equally.

PMID: 40694436
PMCID: PMC12306586
DOI: 10.2196/73226

Application of Large Language Models in Stroke Rehabilitation Health Education: 2-Phase Study

Shiqi Qiang et al. J Med Internet Res. 2025.

. 2025 Jul 22:27:e73226.

doi: 10.2196/73226.

Authors

Shiqi Qiang^#¹, Haitao Zhang^#², Yang Liao^#³, Yue Zhang⁴, Yanfen Gu⁵, Yiyan Wang³, Zehui Xu⁶, Hui Shi⁷, Nuo Han⁸, Haiping Yu⁹

Affiliations

¹ Shanghai East Hospital, School of Medicine, Tongji University, Shanghai, China.
² Department of Emergency and Critical Care, Shanghai East Hospital, School of Medicine, Tongji University, Shanghai, China.
³ Neurological Rehabilitation Center, Shanghai Sunshine Rehabilitation Center, School of Medicine, Tongji University, Shanghai, China.
⁴ Department of Neurology, Shanghai East Hospital, School of Medicine, Tongji University, Shanghai, China.
⁵ Department of Gastrointestinal Endoscopy, Shanghai East Hospital, School of Medicine, Tongji University, Shanghai, China.
⁶ Department of Breast Diseases, Yueyang Hospital of Integrated Traditional Chinese and Western Medicine, Shanghai University of Traditional Chinese Medicine, Shanghai, China.
⁷ Department of Nursing, Zhongshan Hospital, Fudan University, Shanghai, China.
⁸ School of Acupuncture-Moxibustion and Tuina, Shanghai University of Traditional Chinese Medicine, Shanghai, China.
⁹ Department of Nursing, Shanghai East Hospital, School of Medicine, Tongji University, No. 1800 Yuntai Road, Shanghai, 200120, China, 86 18964538997

^# Contributed equally.

PMID: 40694436
PMCID: PMC12306586
DOI: 10.2196/73226

Erratum in

Correction: Application of Large Language Models in Stroke Rehabilitation Health Education: 2-Phase Study.
Qiang S, Zhang H, Liao Y, Zhang Y, Gu Y, Wang Y, Xu Z, Shi H, Han N, Yu H. Qiang S, et al. J Med Internet Res. 2025 Oct 14;27:e84717. doi: 10.2196/84717. J Med Internet Res. 2025. PMID: 41086443 Free PMC article. No abstract available.

Abstract

Background: Stroke is a leading cause of disability and death worldwide, with home-based rehabilitation playing a crucial role in improving patient prognosis and quality of life. Traditional health education often lacks precision, personalization, and accessibility. In contrast, large language models (LLMs) are gaining attention for their potential in medical health education, owing to their advanced natural language processing capabilities. However, the effectiveness of LLMs in home-based stroke rehabilitation remains uncertain.

Objective: This study evaluates the effectiveness of 4 LLMs-ChatGPT-4, MedGo, Qwen, and ERNIE Bot-selected for their diversity in model type, clinical relevance, and accessibility at the time of study design in home-based stroke rehabilitation. The aim is to offer patients with stroke more precise and secure health education pathways while exploring the feasibility of using LLMs to guide health education.

Methods: In the first phase of this study, a literature review and expert interviews identified 15 common questions and 2 clinical cases relevant to patients with stroke in home-based rehabilitation. These were input into 4 LLMs for simulated consultations. Six medical experts (2 clinicians, 2 nursing specialists, and 2 rehabilitation therapists) evaluated the LLM-generated responses using a Likert 5-point scale, assessing accuracy, completeness, readability, safety, and humanity. In the second phase, the top 2 performing models from phase 1 were selected. Thirty patients with stroke undergoing home-based rehabilitation were recruited. Each patient asked both models 3 questions, rated the responses using a satisfaction scale, and assessed readability, text length, and recommended reading age using a Chinese readability analysis tool. Data were analyzed using one-way ANOVA, post hoc Tukey Honestly Significant Difference tests, and paired t tests.

Results: The results revealed significant differences across the 4 models in 5 dimensions: accuracy (P=.002), completeness (P<.001), readability (P=.04), safety (P=.007), and humanity (P<.001). ChatGPT-4 outperformed all models in each dimension, with scores for accuracy (mean 4.28, SD 0.84), completeness (mean 4.35, SD 0.75), readability (mean 4.28, SD 0.85), safety (mean 4.38, SD0.81), and user-friendliness (mean 4.65, SD 0.66). MedGo excelled in accuracy (mean 4.06, SD 0.78) and completeness (mean 4.06, SD 0.74). Qwen and ERNIE Bot scored significantly lower across all 5 dimensions than ChatGPT-4 and MedGo. ChatGPT-4 generated the longest responses (mean 1338.35, SD 236.03) and had the highest readability score (mean 12.88). In the second phase, ChatGPT-4 performed the best overall, while MedGo provided the clearest responses.

Conclusions: LLMs, particularly ChatGPT-4 and MedGo, demonstrated promising performance in home-based stroke rehabilitation education. However, discrepancies between expert and patient evaluations highlight the need for improved alignment with patient comprehension and expectations. Enhancing clinical accuracy, readability, and oversight mechanisms will be essential for future real-world integration.

Keywords: artificial intelligence; health education.; home rehabilitation; large language models; stroke.

© Shiqi Qiang, Haitao Zhang, Yang Liao, Yue Zhang, Yanfen Gu, Yiyan Wang, Zehui Xu, Hui Shi, Nuo Han, Haiping Yu. Originally published in the Journal of Medical Internet Research (https://www.jmir.org).

PubMed Disclaimer

Conflict of interest statement

Conflicts of Interest: None declared.

Figures

**Figure 1.. Research design workflow diagram.**

**Figure 2.. Line chart of the median scores and radar chart for the 4 large language models. (A) Accuracy, (B) completeness, (C) humanity, (D) readability, (E) safety, and (F) radar chart.**

Figure 3.. Comparative evaluation of large language model (LLM) responses on relevant questions. (A) Box plot showing the variation in text length among the 4 LLMs, with a significant difference observed between ChatGPT-4 and Qwen (P<.001). (B) Box plot illustrating the variation in reading difficulty scores among the 4 LLMs. (C) Box plot showing the variation in recommended reading age among the 4 LLMs (P=.07). (D) Density plot displaying the distribution of reading difficulty scores among the models. (E) Bar chart showing the distribution of educational levels required to comprehend the responses. P values indicate pairwise comparisons of Chinese character count: ChatGPT-4 versus MedGo (P=.002), ChatGPT-4 versus Qwen (P<.001), ChatGPT-4 versus ERNIE Bot (P=.02), and MedGo versus Qwen (P=.04); and for reading difficulty score: ChatGPT-4 versus MedGo (P=.01). All rating data in this study were tested and found to follow a normal or approximately normal distribution.

Figure 4.. The Sankey diagram illustrates the classification of questions in both phases. On the left, 90 questions posed by 30 patients are shown, while on the right, the 15 integrated questions are displayed.

See this image and copyright information in PMC

References

1. Feigin VL, Brainin M, Norrving B, et al. World Stroke Organization (WSO): global stroke fact sheet 2022. Int J Stroke. 2022 Jan;17(1):18–29. doi: 10.1177/17474930211065917. doi. Medline. - DOI - PubMed
1. Markus HS, Brainin M, Fisher M. Tracking the global burden of stoke and dementia: World Stroke Day 2020. Int J Stroke. 2020 Oct;15(8):817–818. doi: 10.1177/1747493020959186. doi. Medline. - DOI - PubMed
1. Bartoli D, Petrizzo A, Vellone E, Alvaro R, Pucciarelli G. Impact of telehealth on stroke survivor-caregiver dyad in at-home rehabilitation: a systematic review. J Adv Nurs. 2024 Oct;80(10):4003–4033. doi: 10.1111/jan.16177. doi. Medline. - DOI - PubMed
1. Chang Y, Wang X, Wang J, et al. A survey on evaluation of large language models. ACM Trans Intell Syst Technol. 2024 Jun 30;15(3):1–45. doi: 10.1145/3641289. doi. - DOI
1. Denecke K, May R, Rivera Romero O, LLMHealthGroup Potential of large language models in health care: Delphi study. J Med Internet Res. 2024 May 13;26:e52399. doi: 10.2196/52399. doi. - DOI - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
- JMIR Publications
- PubMed Central
Medical
- MedlinePlus Health Information

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Application of Large Language Models in Stroke Rehabilitation Health Education: 2-Phase Study

Affiliations

Application of Large Language Models in Stroke Rehabilitation Health Education: 2-Phase Study

Authors

Affiliations

Erratum in

Abstract

Conflict of interest statement

Figures

References

MeSH terms

LinkOut - more resources

Full Text Sources

Medical