Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Feb 28;44(5):e70011.
doi: 10.1002/sim.70011.

Instability of the AUROC of Clinical Prediction Models

Affiliations

Instability of the AUROC of Clinical Prediction Models

Florian D van Leeuwen et al. Stat Med. .

Abstract

Background: External validations are essential to assess the performance of a clinical prediction model (CPM) before deployment. Apart from model misspecification, also differences in patient population, the standard of care, predictor definitions, and other factors influence a model's discriminative ability, as commonly quantified by the AUC (or c-statistic). We aimed to quantify the variation in AUCs across sets of external validation studies and propose ways to adjust expectations of a model's performance in a new setting.

Methods: The Tufts-PACE CPM Registry holds a collection of CPMs for prognosis in cardiovascular disease. We analyzed the AUC estimates of 469 CPMs with at least one external validation. Combined, these CPMs had a total of 1603 external validations reported in the literature. For each CPM and its associated set of validation studies, we performed a random-effects meta-analysis to estimate the between-study standard deviation τ $$ \tau $$ among the AUCs. Since the majority of these meta-analyses have only a handful of validations, this leads to very poor estimates of τ $$ \tau $$ . So, instead of focusing on a single CPM, we estimated a log-normal distribution of τ $$ \tau $$ across all 469 CPMs. We then used this distribution as an empirical prior. We used cross-validation to compare this empirical Bayesian approach with frequentist fixed and random-effects meta-analyses.

Results: The 469 CPMs included in our study had a median of 2 external validations with an IQR of [1-3]. The estimated distribution of τ $$ \tau $$ had a mean of 0.055 and a standard deviation of 0.015. If τ $$ \tau $$ = 0.05, then the 95% prediction interval for the AUC in a new setting has a width of at least + / - $$ +/- $$ 0.1, no matter how many validations have been done. When there are fewer than 5 validations, which is typically the case, the usual frequentist methods grossly underestimate the uncertainty about the AUC in a new setting. Accounting for τ $$ \tau $$ in a Bayesian approach achieved near nominal coverage.

Conclusion: Due to large heterogeneity among the validated AUC values of a CPM, there is great irreducible uncertainty in predicting the AUC in a new setting. This uncertainty is underestimated by existing methods. The proposed empirical Bayes approach addresses this problem which merits wide application in judging the validity of prediction models.

Keywords: CPM; clinical prediction models; empirical Bayes; heterogeneity; meta‐analysis.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflicts of interest.

Figures

FIGURE 1
FIGURE 1
Relation between development AUCs and validation AUCs in the Tufts‐PACE CPM Registry. The regression curve shows that the validation AUCs tend to be lower than the development AUCs.
FIGURE 2
FIGURE 2
Forest plot and random‐effects meta‐analysis of the AUC estimates of validations for the CRUSADE CPM. The black diamond is the 95% confidence interval for the mean AUC across all validations. The solid line at the bottom represents the 95% prediction interval for the true AUC in a new study.
FIGURE 3
FIGURE 3
Cumulative confidence and prediction interval of fixed and random‐effects meta‐analyses for the EuroScore model based on the first 1,2,3,…,83 validation studies.
FIGURE 4
FIGURE 4
Top panel: The number of CPMs with exactly 1,2,,5 external validations. Bottom panel: Coverage of the prediction intervals for the observed AUC in the next study.
FIGURE 5
FIGURE 5
Root Mean Squared Prediction Error for the observed AUC in the next study.
FIGURE A1
FIGURE A1
Flowchart of data filtering.

References

    1. Steyerberg E., Clinical Prediction Models (New York: Springer, 2009).
    1. Harrell F. E., Regression Modeling Strategies (Switzerland: Springer International Publishing, 2015).
    1. Justice A. C., “Assessing the Generalizability of Prognostic Information,” Annals of Internal Medicine 130, no. 6 (1999): 515. - PubMed
    1. Altman D. G. and Royston P., “What Do We Mean by Validating a Prognostic Model?,” Statistics in Medicine 19, no. 4 (2000): 453–473. - PubMed
    1. Steyerberg E. W. and Harrell F. E., “Prediction Models Need Appropriate Internal, Internal‐External, and External Validation,” Journal of Clinical Epidemiology 69 (2016): 245. - PMC - PubMed

LinkOut - more resources