Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2024 Aug 1;482(8):1472-1482.
doi: 10.1097/CORR.0000000000003018. Epub 2024 Mar 12.

Machine Learning Did Not Outperform Conventional Competing Risk Modeling to Predict Revision Arthroplasty

Affiliations
Comparative Study

Machine Learning Did Not Outperform Conventional Competing Risk Modeling to Predict Revision Arthroplasty

Jacobien H F Oosterhoff et al. Clin Orthop Relat Res. .

Abstract

Background: Estimating the risk of revision after arthroplasty could inform patient and surgeon decision-making. However, there is a lack of well-performing prediction models assisting in this task, which may be due to current conventional modeling approaches such as traditional survivorship estimators (such as Kaplan-Meier) or competing risk estimators. Recent advances in machine learning survival analysis might improve decision support tools in this setting. Therefore, this study aimed to assess the performance of machine learning compared with that of conventional modeling to predict revision after arthroplasty.

Question/purpose: Does machine learning perform better than traditional regression models for estimating the risk of revision for patients undergoing hip or knee arthroplasty?

Methods: Eleven datasets from published studies from the Dutch Arthroplasty Register reporting on factors associated with revision or survival after partial or total knee and hip arthroplasty between 2018 and 2022 were included in our study. The 11 datasets were observational registry studies, with a sample size ranging from 3038 to 218,214 procedures. We developed a set of time-to-event models for each dataset, leading to 11 comparisons. A set of predictors (factors associated with revision surgery) was identified based on the variables that were selected in the included studies. We assessed the predictive performance of two state-of-the-art statistical time-to-event models for 1-, 2-, and 3-year follow-up: a Fine and Gray model (which models the cumulative incidence of revision) and a cause-specific Cox model (which models the hazard of revision). These were compared with a machine-learning approach (a random survival forest model, which is a decision tree-based machine-learning algorithm for time-to-event analysis). Performance was assessed according to discriminative ability (time-dependent area under the receiver operating curve), calibration (slope and intercept), and overall prediction error (scaled Brier score). Discrimination, known as the area under the receiver operating characteristic curve, measures the model's ability to distinguish patients who achieved the outcomes from those who did not and ranges from 0.5 to 1.0, with 1.0 indicating the highest discrimination score and 0.50 the lowest. Calibration plots the predicted versus the observed probabilities; a perfect plot has an intercept of 0 and a slope of 1. The Brier score calculates a composite of discrimination and calibration, with 0 indicating perfect prediction and 1 the poorest. A scaled version of the Brier score, 1 - (model Brier score/null model Brier score), can be interpreted as the amount of overall prediction error.

Results: Using machine learning survivorship analysis, we found no differences between the competing risks estimator and traditional regression models for patients undergoing arthroplasty in terms of discriminative ability (patients who received a revision compared with those who did not). We found no consistent differences between the validated performance (time-dependent area under the receiver operating characteristic curve) of different modeling approaches because these values ranged between -0.04 and 0.03 across the 11 datasets (the time-dependent area under the receiver operating characteristic curve of the models across 11 datasets ranged between 0.52 to 0.68). In addition, the calibration metrics and scaled Brier scores produced comparable estimates, showing no advantage of machine learning over traditional regression models.

Conclusion: Machine learning did not outperform traditional regression models.

Clinical relevance: Neither machine learning modeling nor traditional regression methods were sufficiently accurate in order to offer prognostic information when predicting revision arthroplasty. The benefit of these modeling approaches may be limited in this context.

PubMed Disclaimer

Conflict of interest statement

All ICMJE Conflict of Interest Forms for authors and Clinical Orthopaedics and Related Research ® editors and board members are on file with the publication and can be viewed on request.

Figures

Fig. 1
Fig. 1
This illustration shows the survival analysis.
Fig. 2
Fig. 2
Bee-swarm plots of differences in model performance AUCt (∆machine learning – traditional regression) are shown here. (A) Shows a comparison of the model’s performance at 1-year follow-up. (B) Shows a comparison of the model’s performance at 2 years of follow-up. (C) Shows a comparison of the model’s performance at 3 years of follow-up. CR = competing risk.
Fig. 3
Fig. 3
These charts demonstrate the cumulative incidence function for the 11 datasets used in this study. Graph A = Aalen-Johansen curve for Peters et al. [24] for the event of revision and death after primary THA; Graph B = Aalen-Johansen curve for Peters et al. [23] for the event of revision and death after primary THA; Graph C = Aalen-Johansen curve for van Steenbergen et al. [38] for the event of revision and death after primary THA and RHA; Graph D = Aalen-Johansen curve for van Oost et al. [37] for the event of revision and death after PKR; Graph E = Aalen-Johansen curve for Burger et al. [6] for the event of revision and death after UKR; Graph F = Aalen-Johansen curve for Kuijpers et al. [15] for the event of revision and death after primary THA; Graph G = Aalen-Johansen curve for Bloemheuvel et al. [5] for the event of re-revision and death after cup revision surgery; Graph H = Aalen-Johansen curve for Bloemheuvel et al. [4] for the event of revision and death after primary THA; Graph I = Aalen-Johansen curve for Spekenbrink-Spooren et al. [29] for the event of revision and death after primary TKA; Graph J = Aalen-Johansen curve for Moerman et al. [19] for the event of revision and death after HA and THA; Graph K = Aalen-Johansen curve for Janssen et al. [12] for the event of revision and death after primary THA. The Aalen-Johansen curve plots the cumulative incidence function of the event of interest (revision) accounting for a competing risk (death). The x-axis represents the time after the index surgery (in years), the y-axis the cumulative incidence functions of revision and death. These curves provide insights into the probability of experiencing different types of events over time when multiple events (revision and death) are present. RHA = resurfacing hip arthroplasty; PKR = partial knee replacement; UKR = unicompartmental knee arthroplasty; HA = hemiarthroplasty. A color image accompanies the online version of this article.

References

    1. Aalen OO, Johansen S. An empirical transition matrix for non-homogeneous Markov chains based on censored observations. Scand J Stat. 1978;5:141-150.
    1. Aram P, Trela-Larsen L, Sayers A, et al. Estimating an individual’s probability of revision surgery after knee replacement: a comparison of modeling approaches using a national data set. Am J Epidemiol. 2018;187:2252-2262. - PMC - PubMed
    1. Austin PC, Steyerberg EW, Putter H. Fine-Gray subdistribution hazard models to simultaneously estimate the absolute risk of different event types: cumulative total failure probability may exceed 1. Stat Med. 2021;40:4200-4212. - PMC - PubMed
    1. Bloemheuvel EM, van Steenbergen LN, Swierstra BA. Dual mobility cups in primary total hip arthroplasties: trend over time in use, patient characteristics, and mid-term revision in 3,038 cases in the Dutch Arthroplasty Register (2007-2016). Acta Orthop. 2019;90:11-14. - PMC - PubMed
    1. Bloemheuvel EM, van Steenbergen LN, Swierstra BA. Lower 5-year cup re-revision rate for dual mobility cups compared with unipolar cups: report of 15,922 cup revision cases in the Dutch Arthroplasty Register (2007-2016). Acta Orthop. 2019;90:338-341. - PMC - PubMed

Publication types