Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jan 2;5(1):1.
doi: 10.1038/s43856-024-00726-1.

Predicting pediatric patient rehabilitation outcomes after spinal deformity surgery with artificial intelligence

Affiliations

Predicting pediatric patient rehabilitation outcomes after spinal deformity surgery with artificial intelligence

Wenqi Shi et al. Commun Med (Lond). .

Abstract

Background: Adolescent idiopathic scoliosis (AIS) is the most common type of scoliosis, affecting 1-4% of adolescents. The Scoliosis Research Society-22R (SRS-22R), a health-related quality-of-life instrument for AIS, has allowed orthopedists to measure subjective patient outcomes before and after corrective surgery beyond objective radiographic measurements. However, research has revealed that there is no significant correlation between the correction rate in major radiographic parameters and improvements in patient-reported outcomes (PROs), making it difficult to incorporate PROs into personalized surgical planning.

Methods: The objective of this study is to develop an artificial intelligence (AI)-enabled surgical planning and counseling support system for post-operative patient rehabilitation outcomes prediction in order to facilitate personalized AIS patient care. A unique multi-site cohort of 455 pediatric patients undergoing spinal fusion surgery at two Shriners Children's hospitals from 2010 is investigated in our analysis. In total, 171 pre-operative clinical features are used to train six machine-learning models for post-operative outcomes prediction. We further employ explainability analysis to quantify the contribution of pre-operative radiographic and questionnaire parameters in predicting patient surgical outcomes. Moreover, we enable responsible AI by calibrating model confidence for human intervention and mitigating health disparities for algorithm fairness.

Results: The best prediction model achieves an area under receiver operating curve (AUROC) performance of 0.86, 0.85, and 0.83 for individual SRS-22R question response prediction over three-time horizons from pre-operation to 6-month, 1-year, and 2-year post-operation, respectively. Additionally, we demonstrate the efficacy of our proposed prediction method to predict other patient rehabilitation outcomes based on minimal clinically important differences (MCID) and correction rates across all three-time horizons.

Conclusions: Based on the relationship analysis, we suggest additional attention to sagittal parameters (e.g., lordosis, sagittal vertical axis) and patient self-image beyond major Cobb angles to improve surgical decision-making for AIS patients. In the age of personalized medicine, the proposed responsible AI-enabled clinical decision-support system may facilitate pre-operative counseling and shared decision-making within real-world clinical settings.

Plain language summary

The goal of this study is to develop a planning and counseling support system for predicting how well patients recover after surgeries. This should allow for more personalized care for scoliosis (spinal curvature) patients. We collected data from 455 pediatric patients who underwent spinal fusion surgery at different locations and used this data to train computer learning methods to predict outcomes after surgery. We show that our proposed computer method can predict the outcome of patient rehabilitation well for short-term (6-month and 1-year) and long-term (2-year) results. We applied additional tests to our method to calculate how well it works and measure fairness to provide a straight-forward, trustworthy, and fair method for real-world clinical use.

PubMed Disclaimer

Conflict of interest statement

Competing interests: The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Overview of the proposed AI-enabled surgical planning and counseling support system for AIS patient rehabilitation outcome prediction to facilitate personalized surgery decision-making.
EHR electrical healthcare record, PRO patient-reported outcome.
Fig. 2
Fig. 2. AUROC and accuracy (ACC) performance of deep-learning models on predicting each individual post-operative question outcome in the SRS-22R questionnaire (Task 1).
We also presented the average AUROC and accuracy achieved in each domain (i.e., function, mental health, pain, satisfaction, and self-image) of patient-reported outcomes. Results for all three-time horizons are included: from pre-operation to 6-month, 1-year, and 2-year post-operation.
Fig. 3
Fig. 3. Feature importance interpretation of post-operative patient satisfaction prediction results (#Q21 in Task 1) in patient sub-cohort II.
a Summary plots of SHAP values for global feature importance in XGBoost model. Each point represents a single patient from the study. The position of the point along the x-axis indicates the influence of that feature on the model’s output for the corresponding patient. b XGBoost feature importance for global feature ranking based on feature ablation study. c Summary plots of SHAP values for global feature importance in the neural network model. d Part of radiographic feature importance identified by surgeons at Shriners Children’s hospitals in Greenville and Lexington for clinical validation. Within each tier, radiographic parameters were categorized by sources, such as lateral, posterior/anterior, and general radiographic data, with no ranking.
Fig. 4
Fig. 4. Deep-learning model performance in Task 2.
AUROC performance of deep-learning models on prediction of a MCID defined by Crawford et al. in four major domains (i.e., function, pain, subtotal, and total), b MCID defined by Carreon et al. in three major domains (i.e., function, pain, and self-image), and c improvement in other clinical relevant domains (i.e., mental health, and satisfaction #Q21, #Q22). Results for all three-time horizons are included: from pre-operation to 6-month, 1-year, and 2-year post-operation. Besides, we included feature importance interpretation of patient MCID prediction results in d subtotal score, e total score, f function, and g pain after surgery in patient sub-cohort II.
Fig. 5
Fig. 5. Reliability diagrams before and after model confidence calibration of deep-learning models on MCID prediction (Task 2) in function (a)&(e), pain (b)&(f), subtotal (c)&(g), and total domains (d)&(h) for patient sub-cohort II.
ad show the reliability diagrams based on MCID prediction results before calibration, demonstrating higher ECE and overconfidence during decision-making where confidence scores (red columns) were usually higher than accuracy (blue columns). eh show the improved calibration after model calibration, with decreased differences between accuracy and confidence. ECE, expected calibration error.
Fig. 6
Fig. 6. Gender fairness consideration for deep-learning model on MCID-total prediction (Task 2) for patient sub-cohort II.
a Overall prevalence of female and male patients are around 4.7:1. b In our patient sub-cohort II, the gender distribution is similar to population prevalence at approximately 5:1. c Deep-learning model performance on MCID-total prediction (Task 2) for patient sub-cohort II before and after mitigating gender bias. We include a gender feature ablation study to show the original model bias without gender information.
Fig. 7
Fig. 7. Correlation between change and correction rate of main radiographic measurements and improvement of SRS-22R patient-reported outcomes at 1-year post-operation.
Statistical analysis results indicated no significant correlation between correction rate and improvement in PROs and were consistent with existing studies,. Heatmap squares are colored by the Pearson correlation coefficient.
Fig. 8
Fig. 8. Model performance of deep-learning models on surgery decision-making prediction (Task 3).
a AUROC results for all three-time horizons are included: from pre-operation to 6-month, 1-year, and 2-year post-operation. b, c Are model confidence calibration result and feature importance interpretation result based on SHAP value in patient sub-cohort II, respectively.
Fig. 9
Fig. 9. Case studies focusing on local explanations of two different predictions of post-operative patient satisfaction based on SHAP value.
a Case study of patient satisfaction increased after surgery. b Case study of patient satisfaction unchanged after surgery. Both case studies are from patient sub-cohort II with correct predictions by the proposed model.

Similar articles

Cited by

References

    1. Weinstein, S. L., Dolan, L. A., Cheng, J. C., Danielsson, A. & Morcuende, J. A. Adolescent idiopathic scoliosis. Lancet371, 1527–1537 (2008). - PubMed
    1. Cheng, J. C. et al. Adolescent idiopathic scoliosis. Nat. Rev. Dis. Prim.1, 1–21 (2015). - PubMed
    1. White, A. Physical properties and functional biomechanics of the spine. Clin. Biomech. Spine22 278–283 (1990).
    1. Marrache, M., Harris, A. B., Puvanesarajah, V. & Sponseller, P. D. Seasonal variation in the volume of posterior spinal arthrodesis procedures for pediatric scoliosis. Spine45, 1293–1298 (2020). - PubMed
    1. Newton, P. O. et al. Factors involved in the decision to perform a selective versus nonselective fusion of lenke 1B and 1C (King-Moe II) curves in adolescent idiopathic scoliosis. Spine28, S217–S223 (2003). - PubMed

LinkOut - more resources