Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Jan 8:384:e074819.
doi: 10.1136/bmj-2023-074819.

Evaluation of clinical prediction models (part 1): from development to external validation

Affiliations

Evaluation of clinical prediction models (part 1): from development to external validation

Gary S Collins et al. BMJ. .

Abstract

Evaluating the performance of a clinical prediction model is crucial to establish its predictive accuracy in the populations and settings intended for use. In this article, the first in a three part series, Collins and colleagues describe the importance of a meaningful evaluation using internal, internal-external, and external validation, as well as exploring heterogeneity, fairness, and generalisability in model performance.

PubMed Disclaimer

Conflict of interest statement

Competing interests: All authors have completed the ICMJE uniform disclosure form at https://www.icmje.org/disclosure-of-interest/and declare: support from Cancer Research UK and the Medical Research Council for the submitted work; no financial relationships with any organisations that might have an interest in the submitted work in the previous three years; no other relationships or activities that could appear to have influenced the submitted work. GSC and RDR are statistical editors for The BMJ.

Figures

Fig 1
Fig 1
Different study design and approaches to develop and evaluate the performance of a multivariable prediction model (D=development; V=validation (evaluation)). Adapted from Collins GS, Reitsma JB, Altman DG, Moons KGM. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement. BMJ 2015;350:g7594. *A study can include more than one analysis type
Fig 2
Fig 2
Variability and overestimation of apparent performance compared to large sample performance, for a model to predict in-hospital mortality within 28 days of trauma injury with increasing sample size of the model development study. ĉ denotes the apparent performance estimate and clarge denotes the performance of the model in the entire CRASH-2 population (n=20 207). Red lines=mean ĉ−clarge for each sample size. Jitter has been added to aid display. ĉ−clarge=0 implies no systematic overestimation or underestimation of ĉ
Fig 3
Fig 3
Variability and overestimation of the apparent and internal (split sample and bootstrap) validation performance compared with the large sample performance, for a model to predict in-hospital mortality within 28 days of trauma injury with increasing sample size of the model development study. ĉ denotes the apparent performance estimate and clarge denotes the performance of the model in the entire CRASH-2 population (n=20 207). The red lines denote the mean ĉ−clarge for each sample size and for each approach. Jitter has been added to aid display. Split sample (apparent, 70%)=70% of the available data were used to develop the model, and its (apparent) performance evaluated in this same data. Split sample (validation, 30%)=the performance of the model (developed in 70% of the available data) in the remaining 30% of the data. ĉ−clarge=0 implies no systematic overestimation or underestimation of ĉ
Fig 4
Fig 4
Graphical illustration of k-fold cross validation. Non-shaded parts used for model development; shaded part used for testing
Fig 5
Fig 5
Graphical illustration of internal-external cross validation. Non-shaded parts used for model development; shaded part used for testing
Fig 6
Fig 6
Internal-external cross validation of the ISARIC (International Severe Acute Respiratory and Emerging Infection Consortium) 4C model. Adapted from Gupta et al. Estimates and confidence intervals taken from original paper where they were reported to two decimal places.

References

    1. van Smeden M, Reitsma JB, Riley RD, Collins GS, Moons KG. Clinical prediction models: diagnosis versus prognosis. J Clin Epidemiol 2021;132:142-5. 10.1016/j.jclinepi.2021.01.009 - DOI - PubMed
    1. Gupta RK, Harrison EM, Ho A, et al. ISARIC4C Investigators . Development and validation of the ISARIC 4C Deterioration model for adults hospitalised with COVID-19: a prospective cohort study. Lancet Respir Med 2021;9:349-59. 10.1016/S2213-2600(20)30559-2 - DOI - PMC - PubMed
    1. Wishart GC, Azzato EM, Greenberg DC, et al. . PREDICT: a new UK prognostic model that predicts survival following surgery for invasive breast cancer. Breast Cancer Res 2010;12:R1. 10.1186/bcr2464 - DOI - PMC - PubMed
    1. Hudda MT, Fewtrell MS, Haroun D, et al. . Development and validation of a prediction model for fat mass in children and adolescents: meta-analysis using individual participant data. BMJ 2019;366:l4293. 10.1136/bmj.l4293 - DOI - PMC - PubMed
    1. Christodoulou E, Ma J, Collins GS, Steyerberg EW, Verbakel JY, Van Calster B. A systematic review shows no performance benefit of machine learning over logistic regression for clinical prediction models. J Clin Epidemiol 2019;110:12-22. 10.1016/j.jclinepi.2019.02.004 - DOI - PubMed