Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Apr;19(2):202-18.
doi: 10.1007/s10985-012-9238-0. Epub 2012 Dec 16.

Understanding increments in model performance metrics

Affiliations

Understanding increments in model performance metrics

Michael J Pencina et al. Lifetime Data Anal. 2013 Apr.

Abstract

The area under the receiver operating characteristic curve (AUC) is the most commonly reported measure of discrimination for prediction models with binary outcomes. However, recently it has been criticized for its inability to increase when important risk factors are added to a baseline model with good discrimination. This has led to the claim that the reliance on the AUC as a measure of discrimination may miss important improvements in clinical performance of risk prediction rules derived from a baseline model. In this paper we investigate this claim by relating the AUC to measures of clinical performance based on sensitivity and specificity under the assumption of multivariate normality. The behavior of the AUC is contrasted with that of discrimination slope. We show that unless rules with very good specificity are desired, the change in the AUC does an adequate job as a predictor of the change in measures of clinical performance. However, stronger or more numerous predictors are needed to achieve the same increment in the AUC for baseline models with good versus poor discrimination. When excellent specificity is desired, our results suggest that the discrimination slope might be a better measure of model improvement than AUC. The theoretical results are illustrated using a Framingham Heart Study example of a model for predicting the 10-year incidence of atrial fibrillation.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Sensitivity at constant Specificity as function of AUC
Figure 2
Figure 2
Sensitivity at constant Specificity as function of Discrimination Slope
Figure 3
Figure 3
Youden Index and Relative Utility as function of AUC
Figure 4
Figure 4
Youden Index and Relative Utility as function of Discrimination Slope
Figure 5
Figure 5
Youden Index and Relative Utility as function of AUC
Figure 6
Figure 6
Youden Index and Relative Utility as function of Discrimination Slope

References

    1. Baker SG, Cook NR, Vickers A, et al. Using relative utility curves to evaluate risk prediction. J R Stat Soc Ser A Stat Soc. 2009;172(4):729–748. - PMC - PubMed
    1. Cook NR. Use and misuse of the receiver operating characteristics curve in risk prediction. Circulation. 2007;115(7):928–935. - PubMed
    1. Cox DR. Regression Models and Life Tables. J. R. Statist. Soc. Series B. 1972;34:187–220.
    1. DeLong ER, DeLong DM, Clarke-Pearson DL. Comparing areas under two or more correlated reciever operating characteristics curves: a nonparamentric approach. Biometrics. 1988;44(3):837–845. - PubMed
    1. Demler OV, Pencina MJ, D’Agostino RB., Sr. Misuse of DeLong test to compare AUCs for nested models. Statist Med. 2012;31:2577–2587. - PMC - PubMed

Publication types