Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Sep;174(3):723-726.
doi: 10.1016/j.surg.2023.05.023. Epub 2023 Jul 5.

Evaluating prediction model performance

Affiliations

Evaluating prediction model performance

John H Cabot et al. Surgery. 2023 Sep.

Abstract

This article highlights important performance metrics to consider when evaluating models developed for supervised classification or regression tasks using clinical data. When evaluating model performance, we detail the basics of confusion matrices, receiver operating characteristic curves, F1 scores, precision-recall curves, mean squared error, and other considerations. In this era, defined by the rapid proliferation of advanced prediction models, familiarity with various performance metrics beyond the area under the receiver operating characteristic curves and the nuances of evaluating model value upon implementation is essential to ensure effective resource allocation and optimal patient care delivery.

PubMed Disclaimer

Conflict of interest statement

Conflicts of Interest/Disclosure:

Authors have no relevant conflicts of interest to disclose.

Figures

Figure 1.
Figure 1.. Confusion Matrix.
This matrix allows for calculation of key model metrics such as sensitivity/recall, specificity, precision, and accuracy.
Figure 2:
Figure 2:. Calibration Plot.
These plots provide a visual example of predicted probablity relative to event rate in a collection of samples. Samples are divided into 10 bins based on their predicted probablity([0–10%], [10–20%], …). For each bin, the percentage of positive events is plotted on the y-axis relative to the center of each bin on the x-axis. The diagnol dashed line represents a perfectly calibrated model for reference.

References

    1. Ting KM. Confusion Matrix. In: Sammut C, Webb GI, eds. Encyclopedia of Machine Learning. Boston, MA: Springer US; 2010:209–209.
    1. Powers D Evaluation: From Precision, Recall and F-Factor to ROC, Informedness, Markedness & Correlation. Mach Learn Technol. 2008;2.
    1. Movahedi F, Padman R, Antaki JF. Limitations of receiver operating characteristic curve on imbalanced data: Assist device mortality risk scores. The Journal of Thoracic and Cardiovascular Surgery. 2021. - PMC - PubMed
    1. Boyd K, Eng KH, Page CD. Area under the Precision-Recall Curve: Point Estimates and Confidence Intervals. In: Advanced Information Systems Engineering. Springer Berlin Heidelberg; 2013:451–466.
    1. Kuhn M, Johnson K. Applied Predictive Modeling. New York, NY: Springer New York; 2013.

Publication types