Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2023 Dec 20;42(29):5451-5478.
doi: 10.1002/sim.9921. Epub 2023 Oct 17.

Calibrating machine learning approaches for probability estimation: A comprehensive comparison

Affiliations
Review

Calibrating machine learning approaches for probability estimation: A comprehensive comparison

Francisco M Ojeda et al. Stat Med. .

Abstract

Statistical prediction models have gained popularity in applied research. One challenge is the transfer of the prediction model to a different population which may be structurally different from the model for which it has been developed. An adaptation to the new population can be achieved by calibrating the model to the characteristics of the target population, for which numerous calibration techniques exist. In view of this diversity, we performed a systematic evaluation of various popular calibration approaches used by the statistical and the machine learning communities for estimating two-class probabilities. In this work, we first provide a review of the literature and, second, present the results of a comprehensive simulation study. The calibration approaches are compared with respect to their empirical properties and relationships, their ability to generalize precise probability estimates to external populations and their availability in terms of easy-to-use software implementations. Third, we provide code from real data analysis allowing its application by researchers. Logistic calibration and beta calibration, which estimate an intercept plus one and two slope parameters, respectively, consistently showed the best results in the simulation studies. Calibration on logit transformed probability estimates generally outperformed calibration methods on nontransformed estimates. In case of structural differences between training and validation data, re-estimation of the entire prediction model should be outweighted against sample size of the validation data. We recommend regression-based calibration approaches using transformed probability estimates, where at least one slope is estimated in addition to an intercept for updating probability estimates in validation studies.

Keywords: calibration; logistic regression; machine learning; probability estimation; probability machine; updating.

PubMed Disclaimer

References

REFERENCES

    1. Diamond GA, Forrester JS. Analysis of probability as an aid in the clinical diagnosis of coronary-artery disease. N Engl J Med. 1979;300:1350-1358. doi:10.1056/NEJM197906143002402
    1. Xie G, Wang R, Shang L, et al. Calculating the overall survival probability in patients with cervical cancer: a nomogram and decision curve analysis-based study. BMC Cancer. 2020;20:833. doi:10.1186/s12885-020-07349-4
    1. Boyer B, Cazorla C. Methods and probability of success after early revision of prosthetic joint infections with debridement, antibiotics and implant retention. Orthop Traumatol Surg Res. 2021;107:102774. doi:10.1016/j.otsr.2020.102774
    1. Uttley AM. Temporal and spatial patterns in a conditional probability machine. In: Shannon CE, McCarthy J, eds. Automata Studies. Princeton: Princeton University Press; 1956:277-285.
    1. Altman DG, Royston P. What do we mean by validating a prognostic model? Stat Med. 2000;19:453-473. doi:10.1002/(SICI)1097-0258(20000229)19:4<453::AID-SIM350>3.0.CO;2-5