. 2025 Feb 28;44(5):e70011.

doi: 10.1002/sim.70011.

Instability of the AUROC of Clinical Prediction Models

Florian D van Leeuwen¹, Ewout W Steyerberg¹, David van Klaveren^{2

3}, Ben Wessler³, David M Kent³, Erik W van Zwet¹

Affiliations

¹ Department of Biomedical Data Sciences, Leiden University Medical Center, Leiden, The Netherlands.
² Department of Public Health, Erasmus University Medical Center, Rotterdam, Netherlands.
³ Predictive Analytics and Comparative Effectiveness Center, Institute for Clinical Research and Health Policy Studies, Tufts Medical Center, Boston, Massachusetts, USA.

PMID: 39921554
PMCID: PMC11806515
DOI: 10.1002/sim.70011

Instability of the AUROC of Clinical Prediction Models

Florian D van Leeuwen et al. Stat Med. 2025.

. 2025 Feb 28;44(5):e70011.

doi: 10.1002/sim.70011.

Authors

Florian D van Leeuwen¹, Ewout W Steyerberg¹, David van Klaveren^{2

3}, Ben Wessler³, David M Kent³, Erik W van Zwet¹

Affiliations

¹ Department of Biomedical Data Sciences, Leiden University Medical Center, Leiden, The Netherlands.
² Department of Public Health, Erasmus University Medical Center, Rotterdam, Netherlands.
³ Predictive Analytics and Comparative Effectiveness Center, Institute for Clinical Research and Health Policy Studies, Tufts Medical Center, Boston, Massachusetts, USA.

PMID: 39921554
PMCID: PMC11806515
DOI: 10.1002/sim.70011

Abstract

Background: External validations are essential to assess the performance of a clinical prediction model (CPM) before deployment. Apart from model misspecification, also differences in patient population, the standard of care, predictor definitions, and other factors influence a model's discriminative ability, as commonly quantified by the AUC (or c-statistic). We aimed to quantify the variation in AUCs across sets of external validation studies and propose ways to adjust expectations of a model's performance in a new setting.

Methods: The Tufts-PACE CPM Registry holds a collection of CPMs for prognosis in cardiovascular disease. We analyzed the AUC estimates of 469 CPMs with at least one external validation. Combined, these CPMs had a total of 1603 external validations reported in the literature. For each CPM and its associated set of validation studies, we performed a random-effects meta-analysis to estimate the between-study standard deviation $τ$ among the AUCs. Since the majority of these meta-analyses have only a handful of validations, this leads to very poor estimates of $τ$ . So, instead of focusing on a single CPM, we estimated a log-normal distribution of $τ$ across all 469 CPMs. We then used this distribution as an empirical prior. We used cross-validation to compare this empirical Bayesian approach with frequentist fixed and random-effects meta-analyses.

Results: The 469 CPMs included in our study had a median of 2 external validations with an IQR of [1-3]. The estimated distribution of $τ$ had a mean of 0.055 and a standard deviation of 0.015. If $τ$ = 0.05, then the 95% prediction interval for the AUC in a new setting has a width of at least $+ / -$ 0.1, no matter how many validations have been done. When there are fewer than 5 validations, which is typically the case, the usual frequentist methods grossly underestimate the uncertainty about the AUC in a new setting. Accounting for $τ$ in a Bayesian approach achieved near nominal coverage.

Conclusion: Due to large heterogeneity among the validated AUC values of a CPM, there is great irreducible uncertainty in predicting the AUC in a new setting. This uncertainty is underestimated by existing methods. The proposed empirical Bayes approach addresses this problem which merits wide application in judging the validity of prediction models.

Keywords: CPM; clinical prediction models; empirical Bayes; heterogeneity; meta‐analysis.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflicts of interest.

Figures

**FIGURE 1**
Relation between development AUCs and validation AUCs in the Tufts‐PACE CPM Registry. The regression curve shows that the validation AUCs tend to be lower than the development AUCs.

**FIGURE 2**
Forest plot and random‐effects meta‐analysis of the AUC estimates of validations for the CRUSADE CPM. The black diamond is the 95% confidence interval for the mean AUC across all validations. The solid line at the bottom represents the 95% prediction interval for the true AUC in a new study.

**FIGURE 3**
Cumulative confidence and prediction interval of fixed and random‐effects meta‐analyses for the EuroScore model based on the first 1,2,3,…,83 validation studies.

**FIGURE 4**
Top panel: The number of CPMs with exactly $1, 2, \dots, 5$ external validations. Bottom panel: Coverage of the prediction intervals for the observed AUC in the next study.

**FIGURE 5**
Root Mean Squared Prediction Error for the observed AUC in the next study.

**FIGURE A1**
Flowchart of data filtering.

See this image and copyright information in PMC

References

1. Steyerberg E., Clinical Prediction Models (New York: Springer, 2009).
1. Harrell F. E., Regression Modeling Strategies (Switzerland: Springer International Publishing, 2015).
1. Justice A. C., “Assessing the Generalizability of Prognostic Information,” Annals of Internal Medicine 130, no. 6 (1999): 515. - PubMed
1. Altman D. G. and Royston P., “What Do We Mean by Validating a Prognostic Model?,” Statistics in Medicine 19, no. 4 (2000): 453–473. - PubMed
1. Steyerberg E. W. and Harrell F. E., “Prediction Models Need Appropriate Internal, Internal‐External, and External Validation,” Journal of Clinical Epidemiology 69 (2016): 245. - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
- PubMed Central

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Instability of the AUROC of Clinical Prediction Models

Affiliations

Instability of the AUROC of Clinical Prediction Models

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

MeSH terms

LinkOut - more resources

Full Text Sources