Evaluating prediction model performance

John H Cabot¹, Elsie Gyang Ross²

Affiliations

¹ Department of Surgery, Division of Vascular Surgery, Stanford University School of Medicine, Stanford, CA.
² Department of Surgery, Division of Vascular Surgery, Stanford University School of Medicine, Stanford, CA; Center for Biomedical Informatics Research, Stanford University School of Medicine, Stanford, CA. Electronic address: elsie.ross@stanford.edu.

PMID: 37419761
PMCID: PMC10529246
DOI: 10.1016/j.surg.2023.05.023

Evaluating prediction model performance

John H Cabot et al. Surgery. 2023 Sep.

. 2023 Sep;174(3):723-726.

doi: 10.1016/j.surg.2023.05.023. Epub 2023 Jul 5.

Authors

John H Cabot¹, Elsie Gyang Ross²

Affiliations

¹ Department of Surgery, Division of Vascular Surgery, Stanford University School of Medicine, Stanford, CA.
² Department of Surgery, Division of Vascular Surgery, Stanford University School of Medicine, Stanford, CA; Center for Biomedical Informatics Research, Stanford University School of Medicine, Stanford, CA. Electronic address: elsie.ross@stanford.edu.

PMID: 37419761
PMCID: PMC10529246
DOI: 10.1016/j.surg.2023.05.023

Abstract

This article highlights important performance metrics to consider when evaluating models developed for supervised classification or regression tasks using clinical data. When evaluating model performance, we detail the basics of confusion matrices, receiver operating characteristic curves, F1 scores, precision-recall curves, mean squared error, and other considerations. In this era, defined by the rapid proliferation of advanced prediction models, familiarity with various performance metrics beyond the area under the receiver operating characteristic curves and the nuances of evaluating model value upon implementation is essential to ensure effective resource allocation and optimal patient care delivery.

PubMed Disclaimer

Conflict of interest statement

Conflicts of Interest/Disclosure:

Authors have no relevant conflicts of interest to disclose.

Figures

**Figure 1.. Confusion Matrix.**
This matrix allows for calculation of key model metrics such as sensitivity/recall, specificity, precision, and accuracy.

**Figure 2:. Calibration Plot.**
These plots provide a visual example of predicted probablity relative to event rate in a collection of samples. Samples are divided into 10 bins based on their predicted probablity([0–10%], [10–20%], …). For each bin, the percentage of positive events is plotted on the y-axis relative to the center of each bin on the x-axis. The diagnol dashed line represents a perfectly calibrated model for reference.

See this image and copyright information in PMC

References

1. Ting KM. Confusion Matrix. In: Sammut C, Webb GI, eds. Encyclopedia of Machine Learning. Boston, MA: Springer US; 2010:209–209.
1. Powers D Evaluation: From Precision, Recall and F-Factor to ROC, Informedness, Markedness & Correlation. Mach Learn Technol. 2008;2.
1. Movahedi F, Padman R, Antaki JF. Limitations of receiver operating characteristic curve on imbalanced data: Assist device mortality risk scores. The Journal of Thoracic and Cardiovascular Surgery. 2021. - PMC - PubMed
1. Boyd K, Eng KH, Page CD. Area under the Precision-Recall Curve: Point Estimates and Confidence Intervals. In: Advanced Information Systems Engineering. Springer Berlin Heidelberg; 2013:451–466.
1. Kuhn M, Johnson K. Applied Predictive Modeling. New York, NY: Springer New York; 2013.

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Medical
- MedlinePlus Health Information

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Evaluating prediction model performance

Affiliations

Evaluating prediction model performance

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Medical