Assessment of the Modified Rankin Scale in Electronic Health Records With a Fine-Tuned Large Language Model: Development and Internal Validation

Luis Silva^{1

2}, Marcus Milani^#², Sohum Bindra^#², Salman Ikramuddin², Megan Tessmer², Kaylee Frederickson¹, Abhigyan Datta¹, Halil Ergen^{2

3}, Alex Stangebye^{2

4}, Dawson Cooper², Kompal Kumar², Jeremy Yeung², Kamakshi Lakshminarayan², Christopher Streib²

Affiliations

¹ Department of Neurology, University of Florida, 1600 SW Archer Road, Gainesville, FL, 32608, United States, 1 7633373761.
² Department of Neurology, University of Minnesota, Minneapolis, MN, United States.
³ Department of Physical Therapy, Gaziantep University, Gaziantep, Turkey.
⁴ Department of Physical Therapy, M Health Fairview, Minneapolis, MN, United States.

^# Contributed equally.

PMID: 41740162
DOI: 10.2196/82607

Assessment of the Modified Rankin Scale in Electronic Health Records With a Fine-Tuned Large Language Model: Development and Internal Validation

Luis Silva et al. JMIR AI. 2026.

. 2026 Feb 25:5:e82607.

doi: 10.2196/82607.

Authors

Affiliations

¹ Department of Neurology, University of Florida, 1600 SW Archer Road, Gainesville, FL, 32608, United States, 1 7633373761.
² Department of Neurology, University of Minnesota, Minneapolis, MN, United States.
³ Department of Physical Therapy, Gaziantep University, Gaziantep, Turkey.
⁴ Department of Physical Therapy, M Health Fairview, Minneapolis, MN, United States.

^# Contributed equally.

PMID: 41740162
DOI: 10.2196/82607

Abstract

Background: The modified Rankin scale (mRS) is an important metric in stroke research, often used as a primary outcome in clinical trials and observational studies. The mRS can be assessed retrospectively from electronic health records (EHRs), but this process is labor-intensive and prone to interrater variability. Large language models (LLMs) have demonstrated potential in automating text classification.

Objective: We aimed to create a fine-tuned LLM that can analyze EHR text and classify mRS scores for clinical and research applications.

Methods: We performed a retrospective cohort study of patients admitted to a specialist stroke neurology service at a large academic hospital system between August 2020 and June 2023. Each patient's medical record was reviewed at two time points: (1) at hospital discharge and (2) approximately 90 days post discharge. Two independent researchers assigned an mRS score at each time point. Two separate models were trained on EHR passages with corresponding mRS scores as labeled outcomes: (1) a multiclass model to classify all seven mRS scores and (2) a binary model to classify functional independence (mRS scores 0-2) versus non-independence (mRS scores 3-6). Four-fold cross-validation was conducted using accuracy and the Cohen κ as model performance metrics.

Results: A total of 2290 EHR passages with corresponding mRS scores were included in model training. The multiclass model-considering all seven scores of the mRS-attained an accuracy of 77% and a weighted Cohen κ of 0.92. Class-specific accuracy was the highest for mRS score 4 (90%) and the lowest for mRS score 2 (28%). The binary model-considering only functional independence versus non-independence-attained an accuracy of 92% and a Cohen κ of 0.84.

Conclusions: Our findings demonstrate that LLMs can be successfully trained to determine mRS scores through EHR text analysis; however, improving discrimination between intermediate scores is required.

Keywords: artificial intelligence; electronic health record; large language model; machine learning; modified Rankin scale; stroke.

© Luis Silva, Marcus Milani, Sohum Bindra, Salman Ikramuddin, Megan Tessmer, Kaylee Frederickson, Abhigyan Datta, Halil Ergen, Alex Stangebye, Dawson Cooper, Kompal Kumar, Jeremy Yeung, Kamakshi Lakshminarayan, Christopher Streib. Originally published in JMIR AI (https://ai.jmir.org).

PubMed Disclaimer

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Assessment of the Modified Rankin Scale in Electronic Health Records With a Fine-Tuned Large Language Model: Development and Internal Validation

Affiliations

Assessment of the Modified Rankin Scale in Electronic Health Records With a Fine-Tuned Large Language Model: Development and Internal Validation

Authors

Affiliations

Abstract