Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Jan 28:11:13.
doi: 10.1186/1471-2288-11-13.

One statistical test is sufficient for assessing new predictive markers

Affiliations

One statistical test is sufficient for assessing new predictive markers

Andrew J Vickers et al. BMC Med Res Methodol. .

Abstract

Background: We have observed that the area under the receiver operating characteristic curve (AUC) is increasingly being used to evaluate whether a novel predictor should be incorporated in a multivariable model to predict risk of disease. Frequently, investigators will approach the issue in two distinct stages: first, by testing whether the new predictor variable is significant in a multivariable regression model; second, by testing differences between the AUC of models with and without the predictor using the same data from which the predictive models were derived. These two steps often lead to discordant conclusions.

Discussion: We conducted a simulation study in which two predictors, X and X*, were generated as standard normal variables with varying levels of predictive strength, represented by means that differed depending on the binary outcome Y. The data sets were analyzed using logistic regression, and likelihood ratio and Wald tests for the incremental contribution of X* were performed. The patient-specific predictors for each of the models were then used as data for a test comparing the two AUCs. Under the null, the size of the likelihood ratio and Wald tests were close to nominal, but the area test was extremely conservative, with test sizes less than 0.006 for all configurations studied. Where X* was associated with outcome, the area test had much lower power than the likelihood ratio and Wald tests.

Summary: Evaluation of the statistical significance of a new predictor when there are existing clinical predictors is most appropriately accomplished in the context of a regression model. Although comparison of AUCs is a conceptually equivalent approach to the likelihood ratio and Wald test, it has vastly inferior statistical properties. Use of both approaches will frequently lead to inconsistent conclusions. Nonetheless, comparison of receiver operating characteristic curves remains a useful descriptive tool for initial evaluation of whether a new predictor might be of clinical relevance.

PubMed Disclaimer

References

    1. Harrell FE Jr, Califf RM, Pryor DB, Lee KL, Rosati RA. Evaluating the yield of medical tests. JAMA. 1982;247(18):2543–2546. doi: 10.1001/jama.247.18.2543. - DOI - PubMed
    1. Kattan MW. Judging new markers by their ability to improve predictive accuracy. J Natl Cancer Inst. 2003;95(9):634–635. doi: 10.1093/jnci/95.9.634. - DOI - PubMed
    1. Pepe MS, Janes H, Longton G, Leisenring W, Newcomb P. Limitations of the odds ratio in gauging the performance of a diagnostic, prognostic, or screening marker. Am J Epidemiol. 2004;159(9):882–890. doi: 10.1093/aje/kwh101. - DOI - PubMed
    1. Cook NR. Statistical evaluation of prognostic versus diagnostic models: beyond the ROC curve. Clin Chem. 2008;54(1):17–23. doi: 10.1373/clinchem.2007.096529. - DOI - PubMed
    1. Hlatky MA, Greenland P, Arnett DK, Ballantyne CM, Criqui MH, Elkind MS, Go AS, Harrell FE Jr, Hong Y, Howard BV. et al.Criteria for evaluation of novel markers of cardiovascular risk: a scientific statement from the American Heart Association. Circulation. 2009;119(17):2408–2416. doi: 10.1161/CIRCULATIONAHA.109.192278. - DOI - PMC - PubMed

Publication types

LinkOut - more resources