. 2020 Sep 20;39(21):2714-2742.

doi: 10.1002/sim.8570. Epub 2020 Jun 16.

Graphical calibration curves and the integrated calibration index (ICI) for survival models

Peter C Austin^{1

2

3}, Frank E Harrell Jr⁴, David van Klaveren^{5

6}

Affiliations

¹ ICES, Toronto, Ontario, Canada.
² Institute of Health Policy, Management and Evaluation, University of Toronto, Toronto, Ontario, Canada.
³ Sunnybrook Research Institute, Toronto, Ontario, Canada.
⁴ Department of Biostatistics, Vanderbilt University School of Medicine, Nashville, Tennessee, USA.
⁵ Department of Public Health, Erasmus MC, Rotterdam, The Netherlands.
⁶ Predictive Analytics and Comparative Effectiveness Center, Institute for Clinical Research and Health Policy Studies, Tufts Medical Center, Boston, Massachusetts, USA.

PMID: 32548928
PMCID: PMC7497089
DOI: 10.1002/sim.8570

Graphical calibration curves and the integrated calibration index (ICI) for survival models

Peter C Austin et al. Stat Med. 2020.

. 2020 Sep 20;39(21):2714-2742.

doi: 10.1002/sim.8570. Epub 2020 Jun 16.

Authors

Peter C Austin^{1

2

3}, Frank E Harrell Jr⁴, David van Klaveren^{5

6}

Affiliations

¹ ICES, Toronto, Ontario, Canada.
² Institute of Health Policy, Management and Evaluation, University of Toronto, Toronto, Ontario, Canada.
³ Sunnybrook Research Institute, Toronto, Ontario, Canada.
⁴ Department of Biostatistics, Vanderbilt University School of Medicine, Nashville, Tennessee, USA.
⁵ Department of Public Health, Erasmus MC, Rotterdam, The Netherlands.
⁶ Predictive Analytics and Comparative Effectiveness Center, Institute for Clinical Research and Health Policy Studies, Tufts Medical Center, Boston, Massachusetts, USA.

PMID: 32548928
PMCID: PMC7497089
DOI: 10.1002/sim.8570

Abstract

In the context of survival analysis, calibration refers to the agreement between predicted probabilities and observed event rates or frequencies of the outcome within a given duration of time. We aimed to describe and evaluate methods for graphically assessing the calibration of survival models. We focus on hazard regression models and restricted cubic splines in conjunction with a Cox proportional hazards model. We also describe modifications of the Integrated Calibration Index, of E50 and of E90. In this context, this is the average (respectively, median or 90th percentile) absolute difference between predicted survival probabilities and smoothed survival frequencies. We conducted a series of Monte Carlo simulations to evaluate the performance of these calibration measures when the underlying model has been correctly specified and under different types of model mis-specification. We illustrate the utility of calibration curves and the three calibration metrics by using them to compare the calibration of a Cox proportional hazards regression model with that of a random survival forest for predicting mortality in patients hospitalized with heart failure. Under a correctly specified regression model, differences between the two methods for constructing calibration curves were minimal, although the performance of the method based on restricted cubic splines tended to be slightly better. In contrast, under a mis-specified model, the smoothed calibration curved constructed using hazard regression tended to be closer to the true calibration curve. The use of calibration curves and of these numeric calibration metrics permits for a comprehensive comparison of the calibration of competing survival models.

Keywords: calibration; model validation; random forests; survival analysis; time-to-event model.

PubMed Disclaimer

Figures

**Figure 1**
Calibration plots when using restricted cubic splines (RCS) and different number of knots. For each of the three different values of number of knots (3, 4, or 5), or there are three curves. The inner curve represents the mean calibration curve across the 1000 simulation replicates. The outer two curves represent the 2.5th and 97.5th percentiles of the calibration curves across the simulation replicates. The density function denotes a non‐parametric estimate of the distribution of predicted risk across the large super‐population (right axis) [Colour figure can be viewed at wileyonlinelibrary.com]

**Figure 2**
ICI/E50/E90 when using RCS and different number of knots. The squares represent the mean value of ICI/E50/E90 across the 1000 simulation replicates. The error bars represent the SD of ICI/E50/E90 across the 1000 simulation replicates [Colour figure can be viewed at wileyonlinelibrary.com]

**Figure 3**
Effect of degree of censoring on estimated calibration curves for different sample sizes and estimation methods. There are three curves for each of the seven degrees of censoring. The inner curve represents the mean calibration curve across the 1000 simulation replicates. The outer two curves represent the 2.5th and 97.5th percentiles of the calibration curves across the simulation replicates. The density function denotes a non‐parametric estimate of the distribution of predicted risk across the large super‐population (right axis) [Colour figure can be viewed at wileyonlinelibrary.com]

**Figure 4**
Effect of degree of censoring on estimated calibration curves for different sample sizes and estimation methods. There are three curves for each of the seven degrees of censoring. The inner curve represents the mean calibration curve across the 1000 simulation replicates. The outer two curves represent the 2.5th and 97.5th percentiles of the calibration curves across the simulation replicates. The density function denotes a non‐parametric estimate of the distribution of predicted risk across the large super‐population (right axis) [Colour figure can be viewed at wileyonlinelibrary.com]

**Figure 5**
Effect of degree of censoring on estimated calibration curves for different sample sizes and estimation methods. There are three curves for each of the seven degrees of censoring. The inner curve represents the mean calibration curve across the 1000 simulation replicates. The outer two curves represent the 2.5th and 97.5th percentiles of the calibration curves across the simulation replicates. The density function denotes a non‐parametric estimate of the distribution of predicted risk across the large super‐population (right axis) [Colour figure can be viewed at wileyonlinelibrary.com]

**Figure 6**
Effect of degree of censoring on estimated calibration curves for different sample sizes and estimation methods. There are three curves for each of the seven degrees of censoring. The inner curve represents the mean calibration curve across the 1000 simulation replicates. The outer two curves represent the 2.5th and 97.5th percentiles of the calibration curves across the simulation replicates. The density function denotes a non‐parametric estimate of the distribution of predicted risk across the large super‐population (right axis) [Colour figure can be viewed at wileyonlinelibrary.com]

**Figure 7**
Effect of degree of censoring on estimated calibration curves for different sample sizes and estimation methods. There are three curves for each of the seven degrees of censoring. The inner curve represents the mean calibration curve across the 1000 simulation replicates. The outer two curves represent the 2.5th and 97.5th percentiles of the calibration curves across the simulation replicates. The density function denotes a non‐parametric estimate of the distribution of predicted risk across the large super‐population (right axis) [Colour figure can be viewed at wileyonlinelibrary.com]

**Figure 8**
Effect of degree of censoring on estimated calibration curves for different sample sizes and estimation methods. There are three curves for each of the seven degrees of censoring. The inner curve represents the mean calibration curve across the 1000 simulation replicates. The outer two curves represent the 2.5th and 97.5th percentiles of the calibration curves across the simulation replicates. The density function denotes a non‐parametric estimate of the distribution of predicted risk across the large super‐population (right axis) [Colour figure can be viewed at wileyonlinelibrary.com]

**Figure 9**
Relationship between degree of censoring and estimation of ICI. There is one line for each combination of sample size and estimation method. The points represent the mean ICI across the 1000 simulation replicates [Colour figure can be viewed at wileyonlinelibrary.com]

**Figure 10**
Relationship between degree of censoring and estimation of E50. There is one line for each combination of sample size and estimation method. The points represent the mean E50 across the 1000 simulation replicates [Colour figure can be viewed at wileyonlinelibrary.com]

**Figure 11**
Relationship between degree of censoring and estimation of E90. There is one line for each combination of sample size and estimation method. The points represent the mean E90 across the 1000 simulation replicates [Colour figure can be viewed at wileyonlinelibrary.com]

**Figure 12**
Calibration plots when the true model included a quadratic term (N = 500). There are three curves for each of the two estimation methods (RCS and hazard regression). The inner curve represents the mean calibration curve across the 1000 simulation replicates. The outer two curves represent the 2.5th and 97.5th percentiles of the calibration curves across the simulation replicates. The green curve denotes the true calibration curve derived from the large super‐population. The density function denotes a non‐parametric estimate of the distribution of predicted risk across the large super‐population (right axis) [Colour figure can be viewed at wileyonlinelibrary.com]

**Figure 13**
Calibration plots when the true model included a quadratic term (N = 1000). There are three curves for each of the two estimation methods (RCS and hazard regression). The inner curve represents the mean calibration curve across the 1000 simulation replicates. The outer two curves represent the 2.5th and 97.5th percentiles of the calibration curves across the simulation replicates. The green curve denotes the true calibration curve derived from the large super‐population. The density function denotes a non‐parametric estimate of the distribution of predicted risk across the large super‐population (right axis) [Colour figure can be viewed at wileyonlinelibrary.com]

**Figure 14**
Calibration plots when the true model included a quadratic term(N = 10,000). There are three curves for each of the two estimation methods (RCS and hazard regression). The inner curve represents the mean calibration curve across the 1000 simulation replicates. The outer two curves represent the 2.5th and 97.5th percentiles of the calibration curves across the simulation replicates. The green curve denotes the true calibration curve derived from the large super‐population. The density function denotes a non‐parametric estimate of the distribution of predicted risk across the large super‐population (right axis) [Colour figure can be viewed at wileyonlinelibrary.com]

**Figure 15**
Calibration plots when the true model included an interaction term (N = 500). There are three curves for each of the two estimation methods (RCS and hazard regression). The inner curve represents the mean calibration curve across the 1000 simulation replicates. The outer two curves represent the 2.5th and 97.5th percentiles of the calibration curves across the simulation replicates. The green curve denotes the true calibration curve derived from the large super‐population. The density function denotes a non‐parametric estimate of the distribution of predicted risk across the large super‐population (right axis) [Colour figure can be viewed at wileyonlinelibrary.com]

**Figure 16**
Calibration plots when the true model included an interaction term (N = 1000). There are three curves for each of the two estimation methods (RCS and hazard regression). The inner curve represents the mean calibration curve across the 1000 simulation replicates. The outer two curves represent the 2.5th and 97.5th percentiles of the calibration curves across the simulation replicates. The green curve denotes the true calibration curve derived from the large super‐population. The density function denotes a non‐parametric estimate of the distribution of predicted risk across the large super‐population (right axis) [Colour figure can be viewed at wileyonlinelibrary.com]

**Figure 17**
Calibration plots when the true model included an interaction term (N = 10,000). There are three curves for each of the two estimation methods (RCS and hazard regression). The inner curve represents the mean calibration curve across the 1000 simulation replicates. The outer two curves represent the 2.5th and 97.5th percentiles of the calibration curves across the simulation replicates. The green curve denotes the true calibration curve derived from the large super‐population. The density function denotes a non‐parametric estimate of the distribution of predicted risk across the large super‐population (right axis) [Colour figure can be viewed at wileyonlinelibrary.com]

**Figure 18**
Calibration curves for the Cox proportional hazard model and the random survival forest when RCS was used to construct the calibration curves. There is one curve for each of the two models. The diagonal line denotes the line of perfect calibration. The density function denotes a non‐parametric estimate of the distribution of predicted risk across the sample (right axis) [Colour figure can be viewed at wileyonlinelibrary.com]

**Figure 19**
Calibration curves for the Cox proportional hazard model and the random survival forest when hazard regression was used to construct the calibration curves. There is one curve for each of the two models. The diagonal line denotes the line of perfect calibration. The density function denotes a non‐parametric estimate of the distribution of predicted risk across the sample (right axis) [Colour figure can be viewed at wileyonlinelibrary.com]

See this image and copyright information in PMC

Cited by

Investigation of end-stage kidney disease risk prediction in an ethnically diverse cohort of people with type 2 diabetes: use of kidney failure risk equation.
Goubar A, Mangelis A, Thomas S, Fountoulakis N, Collins J, Ayis S, Karalliedde J. Goubar A, et al. BMJ Open Diabetes Res Care. 2024 Sep 13;12(4):e004282. doi: 10.1136/bmjdrc-2024-004282. BMJ Open Diabetes Res Care. 2024. PMID: 39277182 Free PMC article.
ECG-Based Deep Learning and Clinical Risk Factors to Predict Atrial Fibrillation.
Khurshid S, Friedman S, Reeder C, Di Achille P, Diamant N, Singh P, Harrington LX, Wang X, Al-Alusi MA, Sarma G, Foulkes AS, Ellinor PT, Anderson CD, Ho JE, Philippakis AA, Batra P, Lubitz SA. Khurshid S, et al. Circulation. 2022 Jan 11;145(2):122-133. doi: 10.1161/CIRCULATIONAHA.121.057480. Epub 2021 Nov 8. Circulation. 2022. PMID: 34743566 Free PMC article.
Cohort design and natural language processing to reduce bias in electronic health records research.
Khurshid S, Reeder C, Harrington LX, Singh P, Sarma G, Friedman SF, Di Achille P, Diamant N, Cunningham JW, Turner AC, Lau ES, Haimovich JS, Al-Alusi MA, Wang X, Klarqvist MDR, Ashburner JM, Diedrich C, Ghadessi M, Mielke J, Eilken HM, McElhinney A, Derix A, Atlas SJ, Ellinor PT, Philippakis AA, Anderson CD, Ho JE, Batra P, Lubitz SA. Khurshid S, et al. NPJ Digit Med. 2022 Apr 8;5(1):47. doi: 10.1038/s41746-022-00590-0. NPJ Digit Med. 2022. PMID: 35396454 Free PMC article.
Externally validated digital decision support tool for time-to-osteoradionecrosis risk-stratification using right-censored multi-institutional observational cohorts.
Humbert-Vidan L, Kamel S, Wentzel A, Kaffey Z, Abdelaal M, Spier KB, West NA, Marai GE, Canahuate G, Zhang X, Chen MM, Wahid KA, Rigert J, Hosseinian S, Schaefer AJ, Brock KK, Chambers M, Otun AO, Aponte-Wesson R, Patel V, Hope A, Phan J, Garden AS, Frank SJ, Morrison WH, Spiotto MT, Rosenthal D, Lee A, He R, Naser MA, Watson E, Hutcheson KA, Mohamed ASR, Sandulache VC, van Dijk LV, Moreno AC, Urbano TG, Fuller CD, Lai SY; MD Anderson Head and Neck Cancer Symptom Working Group. Humbert-Vidan L, et al. Radiother Oncol. 2025 Jun;207:110890. doi: 10.1016/j.radonc.2025.110890. Epub 2025 Apr 11. Radiother Oncol. 2025. PMID: 40222595 Free PMC article.
Characterization and validation of a prognostic model for the N6-methyladenosine-associated ferroptosis gene in colon adenocarcinoma.
Liu X, An J, Wang Q, Jin H. Liu X, et al. Transl Cancer Res. 2024 Aug 31;13(8):4389-4407. doi: 10.21037/tcr-24-88. Epub 2024 Aug 6. Transl Cancer Res. 2024. PMID: 39262465 Free PMC article.

See all "Cited by" articles

References

1. Harrell FE Jr. Regression Modeling Strategies. 2nd ed. New York, NY: Springer‐Verlag; 2015.
1. Steyerberg EW. Clinical Prediction Models. 2nd ed. New York, NY: Springer‐Verlag; 2019.
1. Austin PC, Steyerberg EW. Graphical assessment of internal and external calibration of logistic regression models by using loess smoothers. Stat Med. 2014;33(3):517‐535. - PMC - PubMed
1. Cox DR. Two further applications of a model for binary regression. Biometrika. 1958;45(3–4):592‐565.
1. Wilson PW, D'Agostino RB, Levy D, Belanger AM, Silbershatz H, Kannel WB. Prediction of coronary heart disease using risk factor categories. Circulation. 1998;97(18):1837‐1847. - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

UL1 TR002243/TR/NCATS NIH HHS/United States

LinkOut - more resources

Full Text Sources
- Europe PubMed Central
- PubMed Central
Other Literature Sources
- The Lens - Patent Citations Database

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Graphical calibration curves and the integrated calibration index (ICI) for survival models

Affiliations

Graphical calibration curves and the integrated calibration index (ICI) for survival models

Authors

Affiliations

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Related information

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources