Calibration belt for quality-of-care assessment based on dichotomous outcomes

doi:10.1371/journal.pone.0016110

. 2011 Feb 23;6(2):e16110.

doi: 10.1371/journal.pone.0016110.

Calibration belt for quality-of-care assessment based on dichotomous outcomes

Stefano Finazzi¹, Daniele Poole, Davide Luciani, Paola E Cogo, Guido Bertolini

Affiliations

Affiliation

¹ Astrophysics Sector, Scuola Internazionale Superiore di Studi Avanzati and Instituto Nazionale di Fisica Nucleare Sezione di Trieste, Trieste, Italy. finazzi@sissa.it

PMID: 21373178
PMCID: PMC3043050
DOI: 10.1371/journal.pone.0016110

Calibration belt for quality-of-care assessment based on dichotomous outcomes

Stefano Finazzi et al. PLoS One. 2011.

. 2011 Feb 23;6(2):e16110.

doi: 10.1371/journal.pone.0016110.

Authors

Stefano Finazzi¹, Daniele Poole, Davide Luciani, Paola E Cogo, Guido Bertolini

Affiliation

¹ Astrophysics Sector, Scuola Internazionale Superiore di Studi Avanzati and Instituto Nazionale di Fisica Nucleare Sezione di Trieste, Trieste, Italy. finazzi@sissa.it

PMID: 21373178
PMCID: PMC3043050
DOI: 10.1371/journal.pone.0016110

Abstract

Prognostic models applied in medicine must be validated on independent samples, before their use can be recommended. The assessment of calibration, i.e., the model's ability to provide reliable predictions, is crucial in external validation studies. Besides having several shortcomings, statistical techniques such as the computation of the standardized mortality ratio (SMR) and its confidence intervals, the Hosmer-Lemeshow statistics, and the Cox calibration test, are all non-informative with respect to calibration across risk classes. Accordingly, calibration plots reporting expected versus observed outcomes across risk subsets have been used for many years. Erroneously, the points in the plot (frequently representing deciles of risk) have been connected with lines, generating false calibration curves. Here we propose a methodology to create a confidence band for the calibration curve based on a function that relates expected to observed probabilities across classes of risk. The calibration belt allows the ranges of risk to be spotted where there is a significant deviation from the ideal calibration, and the direction of the deviation to be indicated. This method thus offers a more analytical view in the assessment of quality of care, compared to other approaches.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

**Figure 1. Calibration plots through representation of observed mortality versus expected mortality (bisector, *dashed line*).**
Left panel: Data of 194 patients staying longer than 24 hours in a single Intensive Care Unit (ICU) taking part in GiViTI (Italian Group for the Evaluation of Interventions in Intensive Care Medicine) in 2008; expected mortality calculated with a prediction model developed by GiViTI in 2008. Right panel: Data of 2644 critically ill patients admitted to 103 ICUs in Italy from January to March 2007; expected mortality calculated with SAPS II.

**Figure 2. Calibration functions (*solid line*) compared to the bisector (*dashed line*) for the two discussed examples.**
The stopping criterion yielded for the left curve and for the right one. To avoid extrapolation the curve have been plotted in the range of mortality where data are present. Refer to the caption of Fig. 1 for information about the data sets.

formula image — **Figure 2. Calibration functions (*solid line*) compared to the bisector (*dashed line*) for the two discussed examples.**
The stopping criterion yielded for the left curve and for the right one. To avoid extrapolation the curve have been plotted in the range of mortality where data are present. Refer to the caption of Fig. 1 for information about the data sets.

**Figure 3. Calibration belts for the two discussed examples at two confidence levels.**
(*dark shaded area*) and (*light shaded area*); for the first example (left panel), for the second (right panel). bisector (*dashed line*). As in Fig. 2, the calibrations bands have been plotted in the range of mortality where data are present. Refer to the caption of Fig. 1 for information about the data sets.

See this image and copyright information in PMC

Cited by

Development and validation of a prognosis risk score model for neonatal mortality in the Amhara region, Ethiopia. A prospective cohort study.
Asaye MM, Matebe YH, Lindgren H, Erlandsson K, Gelaye KA. Asaye MM, et al. Glob Health Action. 2024 Dec 31;17(1):2392354. doi: 10.1080/16549716.2024.2392354. Epub 2024 Aug 30. Glob Health Action. 2024. PMID: 39210735 Free PMC article.
External validation of the Malaria Scoring System in a non-endemic emergency department.
Paglietta G, Giamello JD, D'Agnano S, Melchio R, Carralero GL, Gianelli A, Giubbini G, Manassero S, Artana N, Barutta L, Basile E, Fenoglio L, Lupia E, Lauria G. Paglietta G, et al. Intern Emerg Med. 2025 Jul 17. doi: 10.1007/s11739-025-04044-9. Online ahead of print. Intern Emerg Med. 2025. PMID: 40676453
Predicting the individualized risk of poor adherence to ART medication among adolescents living with HIV in Uganda: the Suubi+Adherence study.
Brathwaite R, Ssewamala FM, Neilands TB, Okumu M, Mutumba M, Damulira C, Nabunya P, Kizito S, Sensoy Bahar O, Mellins CA, McKay MM. Brathwaite R, et al. J Int AIDS Soc. 2021 Jun;24(6):e25756. doi: 10.1002/jia2.25756. J Int AIDS Soc. 2021. PMID: 34105865 Free PMC article.
Predicting tumor deposits in rectal cancer: a combined deep learning model using T2-MR imaging and clinical features.
Jin Y, Yin H, Zhang H, Wang Y, Liu S, Yang L, Song B. Jin Y, et al. Insights Imaging. 2023 Dec 20;14(1):221. doi: 10.1186/s13244-023-01564-w. Insights Imaging. 2023. PMID: 38117396 Free PMC article.
Use of electronic medical records in development and validation of risk prediction models of hospital readmission: systematic review.
Mahmoudi E, Kamdar N, Kim N, Gonzales G, Singh K, Waljee AK. Mahmoudi E, et al. BMJ. 2020 Apr 8;369:m958. doi: 10.1136/bmj.m958. BMJ. 2020. PMID: 32269037 Free PMC article.

See all "Cited by" articles

References

1. Donabedian A. The quality of care. how can it be assessed? JAMA. 1988;260:1743–1748. - PubMed
1. Wyatt J, Altman D. Prognostic models: clinically useful or quickly forgotten? Bmj. 1995;311:1539–1541.
1. Lemeshow S, Hosmer D. A review of goodness of fit statistics for use in the development of logistic regression models. Am J Epidemiol. 1982;115:92–106. - PubMed
1. Bertolini G, D'Amico R, Nardi D, Tinazzi A, Apolone G, et al. One model, several results: the paradox of the hosmer–lemeshow goodness-of-fit test for the logistic regression model. J Epidemiol Biostat. 2000;5:251–253. - PubMed
1. Kramer A, Zimmerman J. Assessing the calibration of mortality benchmarks in critical care: The hosmer–lemeshow test revisited. Crit Care Med. 2007;35:2052–2056. - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Molecular Biology Databases
- NIAID Data Ecosystem - Find datasets on Infectious and Immune-mediated Diseases

[1] Donabedian A. The quality of care. how can it be assessed? JAMA. 1988;260:1743–1748. - PubMed

[2] Donabedian A. The quality of care. how can it be assessed? JAMA. 1988;260:1743–1748. - PubMed

[3] Wyatt J, Altman D. Prognostic models: clinically useful or quickly forgotten? Bmj. 1995;311:1539–1541.

[4] Wyatt J, Altman D. Prognostic models: clinically useful or quickly forgotten? Bmj. 1995;311:1539–1541.

[5] Lemeshow S, Hosmer D. A review of goodness of fit statistics for use in the development of logistic regression models. Am J Epidemiol. 1982;115:92–106. - PubMed

[6] Lemeshow S, Hosmer D. A review of goodness of fit statistics for use in the development of logistic regression models. Am J Epidemiol. 1982;115:92–106. - PubMed

[7] Bertolini G, D'Amico R, Nardi D, Tinazzi A, Apolone G, et al. One model, several results: the paradox of the hosmer–lemeshow goodness-of-fit test for the logistic regression model. J Epidemiol Biostat. 2000;5:251–253. - PubMed

[8] Bertolini G, D'Amico R, Nardi D, Tinazzi A, Apolone G, et al. One model, several results: the paradox of the hosmer–lemeshow goodness-of-fit test for the logistic regression model. J Epidemiol Biostat. 2000;5:251–253. - PubMed

[9] Kramer A, Zimmerman J. Assessing the calibration of mortality benchmarks in critical care: The hosmer–lemeshow test revisited. Crit Care Med. 2007;35:2052–2056. - PubMed

[10] Kramer A, Zimmerman J. Assessing the calibration of mortality benchmarks in critical care: The hosmer–lemeshow test revisited. Crit Care Med. 2007;35:2052–2056. - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Calibration belt for quality-of-care assessment based on dichotomous outcomes

Affiliation

Calibration belt for quality-of-care assessment based on dichotomous outcomes

Authors

Affiliation

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Molecular Biology Databases