The Brier score does not evaluate the clinical utility of diagnostic tests or prediction models

Melissa Assel¹, Daniel D Sjoberg¹, Andrew J Vickers¹

Affiliations

PMID: 31093548
PMCID: PMC6460786
DOI: 10.1186/s41512-017-0020-3

The Brier score does not evaluate the clinical utility of diagnostic tests or prediction models

Melissa Assel et al. Diagn Progn Res. 2017.

. 2017 Dec 2:1:19.

doi: 10.1186/s41512-017-0020-3. eCollection 2017.

Authors

Melissa Assel¹, Daniel D Sjoberg¹, Andrew J Vickers¹

Affiliation

¹ Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, USA.

PMID: 31093548
PMCID: PMC6460786
DOI: 10.1186/s41512-017-0020-3

Abstract

Background: A variety of statistics have been proposed as tools to help investigators assess the value of diagnostic tests or prediction models. The Brier score has been recommended on the grounds that it is a proper scoring rule that is affected by both discrimination and calibration. However, the Brier score is prevalence dependent in such a way that the rank ordering of tests or models may inappropriately vary by prevalence.

Methods: We explored four common clinical scenarios: comparison of a highly accurate binary test with a continuous prediction model of moderate predictiveness; comparison of two binary tests where the importance of sensitivity versus specificity is inversely associated with prevalence; comparison of models and tests to default strategies of assuming that all or no patients are positive; and comparison of two models with miscalibration in opposite directions.

Results: In each case, we found that the Brier score gave an inappropriate rank ordering of the tests and models. Conversely, net benefit, a decision-analytic measure, gave results that always favored the preferable test or model.

Conclusions: Brier score does not evaluate clinical value of diagnostic tests or prediction models. We advocate, as an alternative, the use of decision-analytic measures such as net benefit.

Trial registration: Not applicable.

Keywords: Brier score; Concordance index; Mean squared error; Net benefit; Prediction modeling; Sensitivity; Specificity.

PubMed Disclaimer

Conflict of interest statement

Not applicable.Not applicable.The authors declare that they have no competing interests.Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Figures

**Fig. 1**
Calibration plot for various continuous prediction models of differing degrees of miscalibration. All prediction models have an AUC of 0.75 for predicting an event with prevalence 20%. The prediction models include the following: a well-calibrated prediction model, a model that is miscalibrated such that it overestimates risk, a prediction model that underestimates risk, and a prediction model that more severely underestimates risk

See this image and copyright information in PMC

References

1. Collins GS, Reitsma JB, Altman DG, KGM M. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement. Ann Intern Med. 2015;162(1):55–63. doi: 10.7326/M14-0697. - DOI - PubMed
1. Cook NR. Statistical evaluation of prognostic versus diagnostic models: beyond the ROC curve. Clin Chem. 2008;54(1):17–23. doi: 10.1373/clinchem.2007.096529. - DOI - PubMed
1. Baker SG. The central role of receiver operating characteristic (ROC) curves in evaluating tests for the early detection of cancer. J Natl Cancer Inst. 2003;95(7):511–515. doi: 10.1093/jnci/95.7.511. - DOI - PubMed
1. Pencina MJ, D'Agostino RB, Vasan RS. Statistical methods for assessment of added usefulness of new biomarkers. Clin Chem Lab Med. 2010;48(12):1703–1711. doi: 10.1515/CCLM.2010.340. - DOI - PMC - PubMed
1. Hilden J, Gerds TA. A note on the evaluation of novel biomarkers: do not rely on integrated discrimination improvement and net reclassification index. Stat Med. 2014;33(19):3405–3414. doi: 10.1002/sim.5804. - DOI - PubMed

Grants and funding

LinkOut - more resources

Full Text Sources
- Europe PubMed Central
- PubMed Central

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

The Brier score does not evaluate the clinical utility of diagnostic tests or prediction models

Affiliation

The Brier score does not evaluate the clinical utility of diagnostic tests or prediction models

Authors

Affiliation

Abstract

Conflict of interest statement

Figures

References

Grants and funding

LinkOut - more resources

Full Text Sources