Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Dec 2:1:19.
doi: 10.1186/s41512-017-0020-3. eCollection 2017.

The Brier score does not evaluate the clinical utility of diagnostic tests or prediction models

Affiliations

The Brier score does not evaluate the clinical utility of diagnostic tests or prediction models

Melissa Assel et al. Diagn Progn Res. .

Abstract

Background: A variety of statistics have been proposed as tools to help investigators assess the value of diagnostic tests or prediction models. The Brier score has been recommended on the grounds that it is a proper scoring rule that is affected by both discrimination and calibration. However, the Brier score is prevalence dependent in such a way that the rank ordering of tests or models may inappropriately vary by prevalence.

Methods: We explored four common clinical scenarios: comparison of a highly accurate binary test with a continuous prediction model of moderate predictiveness; comparison of two binary tests where the importance of sensitivity versus specificity is inversely associated with prevalence; comparison of models and tests to default strategies of assuming that all or no patients are positive; and comparison of two models with miscalibration in opposite directions.

Results: In each case, we found that the Brier score gave an inappropriate rank ordering of the tests and models. Conversely, net benefit, a decision-analytic measure, gave results that always favored the preferable test or model.

Conclusions: Brier score does not evaluate clinical value of diagnostic tests or prediction models. We advocate, as an alternative, the use of decision-analytic measures such as net benefit.

Trial registration: Not applicable.

Keywords: Brier score; Concordance index; Mean squared error; Net benefit; Prediction modeling; Sensitivity; Specificity.

PubMed Disclaimer

Conflict of interest statement

Not applicable.Not applicable.The authors declare that they have no competing interests.Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Figures

Fig. 1
Fig. 1
Calibration plot for various continuous prediction models of differing degrees of miscalibration. All prediction models have an AUC of 0.75 for predicting an event with prevalence 20%. The prediction models include the following: a well-calibrated prediction model, a model that is miscalibrated such that it overestimates risk, a prediction model that underestimates risk, and a prediction model that more severely underestimates risk

References

    1. Collins GS, Reitsma JB, Altman DG, KGM M. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement. Ann Intern Med. 2015;162(1):55–63. doi: 10.7326/M14-0697. - DOI - PubMed
    1. Cook NR. Statistical evaluation of prognostic versus diagnostic models: beyond the ROC curve. Clin Chem. 2008;54(1):17–23. doi: 10.1373/clinchem.2007.096529. - DOI - PubMed
    1. Baker SG. The central role of receiver operating characteristic (ROC) curves in evaluating tests for the early detection of cancer. J Natl Cancer Inst. 2003;95(7):511–515. doi: 10.1093/jnci/95.7.511. - DOI - PubMed
    1. Pencina MJ, D'Agostino RB, Vasan RS. Statistical methods for assessment of added usefulness of new biomarkers. Clin Chem Lab Med. 2010;48(12):1703–1711. doi: 10.1515/CCLM.2010.340. - DOI - PMC - PubMed
    1. Hilden J, Gerds TA. A note on the evaluation of novel biomarkers: do not rely on integrated discrimination improvement and net reclassification index. Stat Med. 2014;33(19):3405–3414. doi: 10.1002/sim.5804. - DOI - PubMed

LinkOut - more resources