Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Dec 1;481(12):2419-2430.
doi: 10.1097/CORR.0000000000002698. Epub 2023 May 25.

Does the SORG Machine-learning Algorithm for Extremity Metastases Generalize to a Contemporary Cohort of Patients? Temporal Validation From 2016 to 2020

Affiliations

Does the SORG Machine-learning Algorithm for Extremity Metastases Generalize to a Contemporary Cohort of Patients? Temporal Validation From 2016 to 2020

Tom M de Groot et al. Clin Orthop Relat Res. .

Abstract

Background: The ability to predict survival accurately in patients with osseous metastatic disease of the extremities is vital for patient counseling and guiding surgical intervention. We, the Skeletal Oncology Research Group (SORG), previously developed a machine-learning algorithm (MLA) based on data from 1999 to 2016 to predict 90-day and 1-year survival of surgically treated patients with extremity bone metastasis. As treatment regimens for oncology patients continue to evolve, this SORG MLA-driven probability calculator requires temporal reassessment of its accuracy.

Question/purpose: Does the SORG-MLA accurately predict 90-day and 1-year survival in patients who receive surgical treatment for a metastatic long-bone lesion in a more recent cohort of patients treated between 2016 and 2020?

Methods: Between 2017 and 2021, we identified 674 patients 18 years and older through the ICD codes for secondary malignant neoplasm of bone and bone marrow and CPT codes for completed pathologic fractures or prophylactic treatment of an impending fracture. We excluded 40% (268 of 674) of patients, including 18% (118) who did not receive surgery; 11% (72) who had metastases in places other than the long bones of the extremities; 3% (23) who received treatment other than intramedullary nailing, endoprosthetic reconstruction, or dynamic hip screw; 3% (23) who underwent revision surgery, 3% (17) in whom there was no tumor, and 2% (15) who were lost to follow-up within 1 year. Temporal validation was performed using data on 406 patients treated surgically for bony metastatic disease of the extremities from 2016 to 2020 at the same two institutions where the MLA was developed. Variables used to predict survival in the SORG algorithm included perioperative laboratory values, tumor characteristics, and general demographics. To assess the models' discrimination, we computed the c-statistic, commonly referred to as the area under the receiver operating characteristic (AUC) curve for binary classification. This value ranged from 0.5 (representing chance-level performance) to 1.0 (indicating excellent discrimination) Generally, an AUC of 0.75 is considered high enough for use in clinical practice. To evaluate the agreement between predicted and observed outcomes, a calibration plot was used, and the calibration slope and intercept were calculated. Perfect calibration would result in a slope of 1 and intercept of 0. For overall performance, the Brier score and null-model Brier score were determined. The Brier score can range from 0 (representing perfect prediction) to 1 (indicating the poorest prediction). Proper interpretation of the Brier score necessitates a comparison with the null-model Brier score, which represents the score for an algorithm that predicts a probability equal to the population prevalence of the outcome for each patient. Finally, a decision curve analysis was conducted to compare the potential net benefit of the algorithm with other decision-support methods, such as treating all or none of the patients. Overall, 90-day and 1-year mortality were lower in the temporal validation cohort than in the development cohort (90 day: 23% versus 28%; p < 0.001, and 1 year: 51% versus 59%; p<0.001).

Results: Overall survival of the patients in the validation cohort improved from 28% mortality at the 90-day timepoint in the cohort on which the model was trained to 23%, and 59% mortality at the 1-year timepoint to 51%. The AUC was 0.78 (95% CI 0.72 to 0.82) for 90-day survival and 0.75 (95% CI 0.70 to 0.79) for 1-year survival, indicating the model could distinguish the two outcomes reasonably. For the 90-day model, the calibration slope was 0.71 (95% CI 0.53 to 0.89), and the intercept was -0.66 (95% CI -0.94 to -0.39), suggesting the predicted risks were overly extreme, and that in general, the risk of the observed outcome was overestimated. For the 1-year model, the calibration slope was 0.73 (95% CI 0.56 to 0.91) and the intercept was -0.67 (95% CI -0.90 to -0.43). With respect to overall performance, the model's Brier scores for the 90-day and 1-year models were 0.16 and 0.22. These scores were higher than the Brier scores of internal validation of the development study (0.13 and 0.14) models, indicating the models' performance has declined over time.

Conclusion: The SORG MLA to predict survival after surgical treatment of extremity metastatic disease showed decreased performance on temporal validation. Moreover, in patients undergoing innovative immunotherapy, the possibility of mortality risk was overestimated in varying severity. Clinicians should be aware of this overestimation and discount the prediction of the SORG MLA according to their own experience with this patient population. Generally, these results show that temporal reassessment of these MLA-driven probability calculators is of paramount importance because the predictive performance may decline over time as treatment regimens evolve. The SORG-MLA is available as a freely accessible internet application at https://sorg-apps.shinyapps.io/extremitymetssurvival/ .Level of Evidence Level III, prognostic study.

PubMed Disclaimer

Conflict of interest statement

This study was supported by “Stichting de Drie Lichten,” the Michael van Vloten Fund, the Hendrik-Muller fund, and the Nijbakker Morra foundation from the Netherlands. Each author certifies that there are no funding or commercial associations (consultancies, stock ownership, equity interest, patent/licensing arrangements, etc.) that might pose a conflict of interest in connection with the submitted article related to the author or any immediate family members. All ICMJE Conflict of Interest Forms for authors and Clinical Orthopaedics and Related Research ® editors and board members are on file with the publication and can be viewed on request.

Figures

Fig. 1
Fig. 1
This flowchart shows the patients included in this study.
Fig. 2
Fig. 2
These graphs show discrimination of the SORG-MLA for extremity metastasis on temporal validation (n = 406). The area under the receiver operating characteristic curves (AUC) show decreased discriminative performance for (A) 90-day mortality and (B) 1-year mortality.
Fig. 3
Fig. 3
These graphs show the calibration and decision curve analysis of the SORG-MLA for extremity metastasis on temporal validation (n = 406). The calibration curves indicate overestimation of mortality for (A) 90 days in probabilities between 0.30 and 1.0 and (B) between 0.7 and 1.0 for 1-year mortality, indicated by the negative intercept. (C and D) The decision curve analysis showed a greater net benefit for all predicted probabilities compared with default strategies of changing management for all patients or no patients (blue line) compared with changing the treatment for all patients (slanted gray line) or for no patients (horizontal black line). A color image accompanies the online version of this article.

Comment in

References

    1. Alsoof D, McDonald CL, Kuris EO, Daniels AH. Machine learning for the orthopaedic surgeon: uses and limitations. J Bone Joint Surg Am. 2022;104:1586-1596. - PubMed
    1. Cuschieri S. The STROBE guidelines. Saudi J Anaesth. 2019;13:S31-S34. - PMC - PubMed
    1. Fan J, Upadhye S, Worster A. Understanding receiver operating characteristic (ROC) curves. CJEM. 2006;8:19-20. - PubMed
    1. Forsberg JA, Eberhardt J, Boland PJ, Wedin R, Healey JH. Estimating survival in patients with operable skeletal metastases: an application of a Bayesian belief network. PLoS One. 2011;6:e19956. - PMC - PubMed
    1. Groot OQ, Bongers MER, Ogink PT, et al. Does artificial intelligence outperform natural intelligence in interpreting musculoskeletal radiological studies? A systematic review. Clin Orthop Relat Res. 2020;478:2751-2764. - PMC - PubMed