Instability of the AUROC of Clinical Prediction Models
- PMID: 39921554
- PMCID: PMC11806515
- DOI: 10.1002/sim.70011
Instability of the AUROC of Clinical Prediction Models
Abstract
Background: External validations are essential to assess the performance of a clinical prediction model (CPM) before deployment. Apart from model misspecification, also differences in patient population, the standard of care, predictor definitions, and other factors influence a model's discriminative ability, as commonly quantified by the AUC (or c-statistic). We aimed to quantify the variation in AUCs across sets of external validation studies and propose ways to adjust expectations of a model's performance in a new setting.
Methods: The Tufts-PACE CPM Registry holds a collection of CPMs for prognosis in cardiovascular disease. We analyzed the AUC estimates of 469 CPMs with at least one external validation. Combined, these CPMs had a total of 1603 external validations reported in the literature. For each CPM and its associated set of validation studies, we performed a random-effects meta-analysis to estimate the between-study standard deviation among the AUCs. Since the majority of these meta-analyses have only a handful of validations, this leads to very poor estimates of . So, instead of focusing on a single CPM, we estimated a log-normal distribution of across all 469 CPMs. We then used this distribution as an empirical prior. We used cross-validation to compare this empirical Bayesian approach with frequentist fixed and random-effects meta-analyses.
Results: The 469 CPMs included in our study had a median of 2 external validations with an IQR of [1-3]. The estimated distribution of had a mean of 0.055 and a standard deviation of 0.015. If = 0.05, then the 95% prediction interval for the AUC in a new setting has a width of at least 0.1, no matter how many validations have been done. When there are fewer than 5 validations, which is typically the case, the usual frequentist methods grossly underestimate the uncertainty about the AUC in a new setting. Accounting for in a Bayesian approach achieved near nominal coverage.
Conclusion: Due to large heterogeneity among the validated AUC values of a CPM, there is great irreducible uncertainty in predicting the AUC in a new setting. This uncertainty is underestimated by existing methods. The proposed empirical Bayes approach addresses this problem which merits wide application in judging the validity of prediction models.
Keywords: CPM; clinical prediction models; empirical Bayes; heterogeneity; meta‐analysis.
© 2025 The Author(s). Statistics in Medicine published by John Wiley & Sons Ltd.
Conflict of interest statement
The authors declare no conflicts of interest.
Figures






References
-
- Steyerberg E., Clinical Prediction Models (New York: Springer, 2009).
-
- Harrell F. E., Regression Modeling Strategies (Switzerland: Springer International Publishing, 2015).
-
- Justice A. C., “Assessing the Generalizability of Prognostic Information,” Annals of Internal Medicine 130, no. 6 (1999): 515. - PubMed
-
- Altman D. G. and Royston P., “What Do We Mean by Validating a Prognostic Model?,” Statistics in Medicine 19, no. 4 (2000): 453–473. - PubMed
MeSH terms
LinkOut - more resources
Full Text Sources