Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Aug;92(4):385-393.
doi: 10.1080/17453674.2021.1910448. Epub 2021 Apr 18.

Availability and reporting quality of external validations of machine-learning prediction models with orthopedic surgical outcomes: a systematic review

Affiliations

Availability and reporting quality of external validations of machine-learning prediction models with orthopedic surgical outcomes: a systematic review

Olivier Q Groot et al. Acta Orthop. 2021 Aug.

Abstract

Background and purpose - External validation of machine learning (ML) prediction models is an essential step before clinical application. We assessed the proportion, performance, and transparent reporting of externally validated ML prediction models in orthopedic surgery, using the Transparent Reporting for Individual Prognosis or Diagnosis (TRIPOD) guidelines.Material and methods - We performed a systematic search using synonyms for every orthopedic specialty, ML, and external validation. The proportion was determined by using 59 ML prediction models with only internal validation in orthopedic surgical outcome published up until June 18, 2020, previously identified by our group. Model performance was evaluated using discrimination, calibration, and decision-curve analysis. The TRIPOD guidelines assessed transparent reporting.Results - We included 18 studies externally validating 10 different ML prediction models of the 59 available ML models after screening 4,682 studies. All external validations identified in this review retained good discrimination. Other key performance measures were provided in only 3 studies, rendering overall performance evaluation difficult. The overall median TRIPOD completeness was 61% (IQR 43-89), with 6 items being reported in less than 4/18 of the studies.Interpretation - Most current predictive ML models are not externally validated. The 18 available external validation studies were characterized by incomplete reporting of performance measures, limiting a transparent examination of model performance. Further prospective studies are needed to validate or refute the myriad of predictive ML models in orthopedics while adhering to existing guidelines. This ensures clinicians can take full advantage of validated and clinically implementable ML decision tools.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Flowchart of study selection.
Figure 6.
Figure 6.
Distribution of development and external validation studies. All of the developmental studies that were externally validated except 2 South Korean ones were built on American datasets, unlike the origin of the external validation studies. Symbols without a number correspond with 1 study. Studies that included both development and external validation within the same study were counted twice in the figure according to where both datasets originated from.
Figure 8.
Figure 8.
Overall adherence to each TRIPOD item (n = 18).
Figure 9.
Figure 9.
PROBAST results for all 4 domains and overall judgement (n = 18).

References

    1. Anderson A B, Wedin R, Fabbri N, Boland P, Healey J, Forsberg J A.. External validation of PATHFx version 3.0 in patients treated surgically and nonsurgically for symptomatic skeletal metastases. Clin Orthop Relat Res 2020; 478(4): 808–18. - PMC - PubMed
    1. Bongers M E R, Thio Q C B S, Karhade A V, Stor M L, Raskin K A, Lozano Calderon S A, DeLaney T F, Ferrone M L, Schwab J H.. Does the SORG algorithm predict 5-year survival in patients with chondrosarcoma? An external validation. Clin Orthop Relat Res 2019; 477(10): 2296–303. - PMC - PubMed
    1. Bongers M E R, Karhade A V, Setola E, Gambarotti M, Groot O Q, Erdoğan K E, Picci P, Donati D M, Schwab J H, Palmerini E.. How does the skeletal oncology research group algorithm’s prediction of 5-year survival in patients with chondrosarcoma perform on international validation? Clin Orthop Relat Res 2020a; 478(10): 2300–8. - PMC - PubMed
    1. Bongers M E R, Karhade A V, Villavieja J, Groot O Q, Bilsky M H, Laufer I, Schwab J H.. Does the SORG algorithm generalize to a contemporary cohort of patients with spinal metastases on external validation? Spine J 2020b; 20(10): 1646–52. - PubMed
    1. Bouwmeester W, Zuithoff N P A, Mallett S, Geerlings M I, Vergouwe Y, Steyerberg E W, Altman D G, Moons K G M.. Reporting and methods in clinical prediction research: a systematic review. PLoS Med 2012; 9(5): 1–12. - PMC - PubMed

Publication types