Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Mar;53(2):237-58.
doi: 10.1002/bimj.201000078. Epub 2011 Feb 3.

Performance of reclassification statistics in comparing risk prediction models

Affiliations

Performance of reclassification statistics in comparing risk prediction models

Nancy R Cook et al. Biom J. 2011 Mar.

Abstract

Concerns have been raised about the use of traditional measures of model fit in evaluating risk prediction models for clinical use, and reclassification tables have been suggested as an alternative means of assessing the clinical utility of a model. Several measures based on the table have been proposed, including the reclassification calibration (RC) statistic, the net reclassification improvement (NRI), and the integrated discrimination improvement (IDI), but the performance of these in practical settings has not been fully examined. We used simulations to estimate the type I error and power for these statistics in a number of scenarios, as well as the impact of the number and type of categories, when adding a new marker to an established or reference model. The type I error was found to be reasonable in most settings, and power was highest for the IDI, which was similar to the test of association. The relative power of the RC statistic, a test of calibration, and the NRI, a test of discrimination, varied depending on the model assumptions. These tools provide unique but complementary information.

PubMed Disclaimer

Conflict of interest statement

Conflict of Interest

The authors have declared no conflict of interest.

Figures

Figure 1
Figure 1
Predictiveness curves for models with and without variable Y (left), high sensitivity C-reactive protein (CRP) (middle), and systolic blood pressure (SBP) (right). Horizontal lines indicate the prevalence of the outcome.
Figure 2
Figure 2
Null distribution of reclassification calibration statistics: observed proportions with p-values < 0.10 (top) or <0.05 (bottom) for test of correct model X or XY when ORY=1 and for model XY when ORY=3 with cell size >20 and for average expectation ≥ 5, for P(D)=0.10.
Figure 3
Figure 3
Power for measures of model fit, with P(D) = 0.10.
Figure 4
Figure 4
Power for IDI (top), NRI (middle) and reclassification calibration test (bottom) by probability of disease P(D) and ORY, using ORX = 8, N=5000, and category cut points of 0.5*P(D). P(D), and 2*P(D).
Figure 5
Figure 5
Power for measures of model fit, with ORX = 8, P(D) = 0.10, and varying prevalence (PY) of a binary predictor Y.

References

    1. Adult Treatment Panel III. Executive Summary of The Third Report of The National Cholesterol Education Program (NCEP) Expert Panel on Detection, Evaluation, And Treatment of High Blood Cholesterol In Adults (Adult Treatment Panel III) JAMA. 2001;285:2486–2497. - PubMed
    1. Ash A, Shwartz M. R2: a useful measure of model performance when predicting a dichotomous outcome. Statistics in Medicine. 1999;18:375–384. - PubMed
    1. Baker SG, Cook NR, Vickers A, Kramer BS. Using relative utility curves to evaluate risk prediction. Journal of the Royal Statistics Society, Series A. 2009;172:729–748. - PMC - PubMed
    1. Cook NR. Use and misuse of the receiver operating characteristic curve in risk prediction. Circulation. 2007;115:928–935. - PubMed
    1. Cook NR. Comments on ‘Evaluating the added predictive ability of a new biomarker: from area under the ROC curve to reclassification and beyond’. Statistics in Medicine. 2008;27:191–195. - PubMed

Publication types

LinkOut - more resources