Enhancing Magnetic Resonance Imaging (MRI) Report Comprehension in Spinal Trauma: Readability Analysis of AI-Generated Explanations for Thoracolumbar Fractures
- PMID: 40611700
- PMCID: PMC12231343
- DOI: 10.2196/69654
Enhancing Magnetic Resonance Imaging (MRI) Report Comprehension in Spinal Trauma: Readability Analysis of AI-Generated Explanations for Thoracolumbar Fractures
Abstract
Background: Magnetic resonance imaging (MRI) reports are challenging for patients to interpret and may subject patients to unnecessary anxiety. The advent of advanced artificial intelligence (AI) large language models (LLMs), such as GPT-4o, hold promise for translating complex medical information into layman terms.
Objective: This paper aims to evaluate the accuracy, helpfulness, and readability of GPT-4o in explaining MRI reports of patients with thoracolumbar fractures.
Methods: MRI reports of 20 patients presenting with thoracic or lumbar vertebral body fractures were obtained. GPT-4o was prompted to explain the MRI report in layman's terms. The generated explanations were then presented to 7 board-certified spine surgeons for evaluation on the reports' helpfulness and accuracy. The MRI report text and GPT-4o explanations were then analyzed to grade the readability of the texts using the Flesch Readability Ease Score (FRES) and Flesch-Kincaid Grade Level (FKGL) Scale.
Results: The layman explanations provided by GPT-4o were found to be helpful by all surgeons in 17 cases, with 6 of 7 surgeons finding the information helpful in the remaining 3 cases. ChatGPT-generated layman reports were rated as "accurate" by all 7 surgeons in 11/20 cases (55%). In an additional 5/20 cases (25%), 6 out of 7 surgeons agreed on their accuracy. In the remaining 4/20 cases (20%), accuracy ratings varied, with 4 or 5 surgeons considering them accurate. Review of surgeon feedback on inaccuracies revealed that the radiology reports were often insufficiently detailed. The mean FRES score of the MRI reports was significantly lower than the GPT-4o explanations (32.15, SD 15.89 vs 53.9, SD 7.86; P<.001). The mean FKGL score of the MRI reports trended higher compared to the GPT-4o explanations (11th-12th grade vs 10th-11th grade level; P=.11).
Conclusions: Overall helpfulness and readability ratings for AI-generated summaries of MRI reports were high, with few inaccuracies recorded. This study demonstrates the potential of GPT-4o to serve as a valuable tool for enhancing patient comprehension of MRI report findings.
Keywords: AI; ChatGPT; LLM; MRI; artificial intelligence; large language model; magnetic resonance imaging; orthopedic surgery; patient education; spine surgery; thoracolumbar fracture; trauma.
© David C Sing, Kishan S Shah, Michael Pompliano, Paul H Yi, Calogero Velluto, Ali Bagheri, Robert K Eastlack, Stephen R Stephan, Gregory M Mundis Jr. Originally published in JMIR AI (https://ai.jmir.org).
Conflict of interest statement
Author RKE holds stock or stock options in Aclarion, Alphatec Spine, Orthofix, Inc; Nuvasive, and Spine Innovations. RKE receives IP royalties from Aesculap/B.Braun, Globus Medical, Nuvasive, Seaspine, and SI Bone. RKE is a paid consultant for Aesculap/B.Braun, Amgen Co, Johnson & Johnson, Kuros, Medtronic, Neo Spine, Nuvasive, Silony, Spinal Elements, and Seaspine. RKE receives research support from Medtronic, Sofamor Danek, Nuvasive, and Seaspine. RKE is a paid presenter or speaker for Radius and serves as a board and committee member for the San Diego Orthopaedic Research Society, San Diego Spine Foundation, and Scoliosis Research Society.
Author GMM Jr holds stock or stock options in Alphatec Spine, Nuvasive, and Orthofix, Inc. GMM Jr receives IP royalties from Nuvasive, Seaspine, and Stryker. GMM Jr is a paid consultant for Globus, Carlsmed, Seaspine, and SI Bone. GMM Jr receives research support from Medtronic, Sofamor Danek, Globus, and Orthofix. GMM Jr is a board or committee member for the Scoliosis Research Society, Society of Minimally Invasive Spine Surgery, San Diego Orthopaedic Society, Global Spine Outreach, and San Diego Spine Foundation.
Figures

Similar articles
-
Improving the Readability of Institutional Heart Failure-Related Patient Education Materials Using GPT-4: Observational Study.JMIR Cardio. 2025 Jul 8;9:e68817. doi: 10.2196/68817. JMIR Cardio. 2025. PMID: 40627825 Free PMC article.
-
Readability of Orthopaedic Patient Educational Material: An artificial intelligence application.J Clin Orthop Trauma. 2025 Mar 12;64:102971. doi: 10.1016/j.jcot.2025.102971. eCollection 2025 May. J Clin Orthop Trauma. 2025. PMID: 40226577
-
A structured evaluation of LLM-generated step-by-step instructions in cadaveric brachial plexus dissection.BMC Med Educ. 2025 Jul 1;25(1):903. doi: 10.1186/s12909-025-07493-0. BMC Med Educ. 2025. PMID: 40598351 Free PMC article.
-
Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3. Cochrane Database Syst Rev. 2022. PMID: 35593186 Free PMC article.
-
Positron emission tomography (PET) and magnetic resonance imaging (MRI) for the assessment of axillary lymph node metastases in early breast cancer: systematic review and economic evaluation.Health Technol Assess. 2011 Jan;15(4):iii-iv, 1-134. doi: 10.3310/hta15040. Health Technol Assess. 2011. PMID: 21276372 Free PMC article.
References
-
- 21st Century Cures Act. FDA. Jan 31, 2020. [24-02-2025]. https://www.fda.gov/regulatory-information/selected-amendments-fdc-act/2... URL. Accessed.
LinkOut - more resources
Full Text Sources