Testing for improvement in prediction model performance

Margaret Sullivan Pepe¹, Kathleen F Kerr, Gary Longton, Zheyu Wang

Affiliations

PMID: 23296397
PMCID: PMC3625503
DOI: 10.1002/sim.5727

Testing for improvement in prediction model performance

Margaret Sullivan Pepe et al. Stat Med. 2013.

. 2013 Apr 30;32(9):1467-82.

doi: 10.1002/sim.5727. Epub 2013 Jan 7.

Authors

Margaret Sullivan Pepe¹, Kathleen F Kerr, Gary Longton, Zheyu Wang

Affiliation

¹ Biostatistics and Biomathematics, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA. mspepe@u.washington.edu

PMID: 23296397
PMCID: PMC3625503
DOI: 10.1002/sim.5727

Abstract

Authors have proposed new methodology in recent years for evaluating the improvement in prediction performance gained by adding a new predictor, Y, to a risk model containing a set of baseline predictors, X, for a binary outcome D. We prove theoretically that null hypotheses concerning no improvement in performance are equivalent to the simple null hypothesis that Y is not a risk factor when controlling for X, H0 : P(D = 1 | X,Y ) = P(D = 1 | X). Therefore, testing for improvement in prediction performance is redundant if Y has already been shown to be a risk factor. We also investigate properties of tests through simulation studies, focusing on the change in the area under the ROC curve (AUC). An unexpected finding is that standard testing procedures that do not adjust for variability in estimated regression coefficients are extremely conservative. This may explain why the AUC is widely considered insensitive to improvements in prediction performance and suggests that the problem of insensitivity has to do with use of invalid procedures for inference rather than with the measure itself. To avoid redundant testing and use of potentially problematic methods for inference, we recommend that hypothesis testing for no improvement be limited to evaluation of Y as a risk factor, for which methods are well developed and widely available. Analyses of measures of prediction performance should focus on estimation rather than on testing for no improvement in performance.

PubMed Disclaimer

Figures

**Figure 1**
Predictiveness curves to assess calibration of baseline and enhanced risk models for renal artery stenosis in the evaluation dataset (n = 284). Shown are the modeled risk quantiles (as curves) and the observed event rates within each decile of modeled risk (as open circles). Hosmer-Lemeshow statistics corresponding to the plots have p-values equal to 0.43 (baseline model) and 0.51 (enhanced model). The quantiles of the risk function fitted in the one-third dataset used to generate X is also shown as the dashed curve and appears steeper than the model recalibrated in the evaluation dataset.

See this image and copyright information in PMC

References

1. Gail MH, Brinton LA, Byar DP, Corle DK, Green SB, Schairer C, Mulvihill JJ. Projecting individualized probabilities of developing breast cancer for white females who are being examined annually. Journal of the National Cancer Institute. 1989;81:1879–1886. DOI: 10.1093/jnci/81.24.1879. - PubMed
1. Gail MH, Costantino JP. Validating and improving models for projecting the absolute risk of breast cancer. Journal of the National Cancer Institute. 2001;93:334–335. DOI: 10.1093/jnci/93.5.334. - PubMed
1. Barlow WE, White E, Ballard-Barbash R, Vacek PM, Titus-Ernstoff L, Carney PA, Tice JA, Buist DS, Geller BM, Rosenberg R, Yankaskas BC, Kerlikowske K. Prospective breast cancer risk prediction model for women undergoing screening mammography. Journal of the National Cancer Institute. 2006;98:1204–1214. DOI: 10.1093/jnci/djj331. - PubMed
1. Chen J, Pee D, Ayyagari R, Graubard B, Schairer C, Byrne C, Benichou J, Gail MH. Projecting absolute invasive breast cancer risk in white women with a model that includes mammographic density. Journal of the National Cancer Institute. 2006;98:1215–1226. DOI: 10.1093/jnci/djj332. - PubMed
1. Wacholder S, Hartge P, Prentice R, Garcia-Closas M, Feigelson HS, Diver WR, Thun MJ, Cox DG, Hankinson SE, Kraft P, Rosner B, Berg CD, Brinton LA, Lissowska J, Sherman ME, Chlebowski R, Kooperberg C, Jackson RD, Buckman DW, Hui P, Pfeiffer R, Jacobs KB, Thomas GD, Hoover RN, Gail MH, Chanock SJ, Hunter DJ. Performance of Common Genetic Variants in Breast-Cancer Risk Models. New England Journal of Medicine. 2010;362:986–993. DOI: 10.1056/NEJMoa0907727. - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

Grants and funding

U01 CA086368/CA/NCI NIH HHS/United States

LinkOut - more resources

Full Text Sources
- Europe PubMed Central
- PubMed Central
Other Literature Sources
- The Lens - Patent Citations Database
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Testing for improvement in prediction model performance

Affiliation

Testing for improvement in prediction model performance

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources