Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Feb 23;6(2):e16110.
doi: 10.1371/journal.pone.0016110.

Calibration belt for quality-of-care assessment based on dichotomous outcomes

Affiliations

Calibration belt for quality-of-care assessment based on dichotomous outcomes

Stefano Finazzi et al. PLoS One. .

Abstract

Prognostic models applied in medicine must be validated on independent samples, before their use can be recommended. The assessment of calibration, i.e., the model's ability to provide reliable predictions, is crucial in external validation studies. Besides having several shortcomings, statistical techniques such as the computation of the standardized mortality ratio (SMR) and its confidence intervals, the Hosmer-Lemeshow statistics, and the Cox calibration test, are all non-informative with respect to calibration across risk classes. Accordingly, calibration plots reporting expected versus observed outcomes across risk subsets have been used for many years. Erroneously, the points in the plot (frequently representing deciles of risk) have been connected with lines, generating false calibration curves. Here we propose a methodology to create a confidence band for the calibration curve based on a function that relates expected to observed probabilities across classes of risk. The calibration belt allows the ranges of risk to be spotted where there is a significant deviation from the ideal calibration, and the direction of the deviation to be indicated. This method thus offers a more analytical view in the assessment of quality of care, compared to other approaches.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Calibration plots through representation of observed mortality versus expected mortality (bisector, dashed line).
Left panel: Data of 194 patients staying longer than 24 hours in a single Intensive Care Unit (ICU) taking part in GiViTI (Italian Group for the Evaluation of Interventions in Intensive Care Medicine) in 2008; expected mortality calculated with a prediction model developed by GiViTI in 2008. Right panel: Data of 2644 critically ill patients admitted to 103 ICUs in Italy from January to March 2007; expected mortality calculated with SAPS II.
Figure 2
Figure 2. Calibration functions (solid line) compared to the bisector (dashed line) for the two discussed examples.
The stopping criterion yielded formula image for the left curve and formula image for the right one. To avoid extrapolation the curve have been plotted in the range of mortality where data are present. Refer to the caption of Fig. 1 for information about the data sets.
Figure 3
Figure 3. Calibration belts for the two discussed examples at two confidence levels.
formula image (dark shaded area) and formula image (light shaded area); formula image for the first example (left panel), formula image for the second (right panel). bisector (dashed line). As in Fig. 2, the calibrations bands have been plotted in the range of mortality where data are present. Refer to the caption of Fig. 1 for information about the data sets.

Similar articles

Cited by

References

    1. Donabedian A. The quality of care. how can it be assessed? JAMA. 1988;260:1743–1748. - PubMed
    1. Wyatt J, Altman D. Prognostic models: clinically useful or quickly forgotten? Bmj. 1995;311:1539–1541.
    1. Lemeshow S, Hosmer D. A review of goodness of fit statistics for use in the development of logistic regression models. Am J Epidemiol. 1982;115:92–106. - PubMed
    1. Bertolini G, D'Amico R, Nardi D, Tinazzi A, Apolone G, et al. One model, several results: the paradox of the hosmer–lemeshow goodness-of-fit test for the logistic regression model. J Epidemiol Biostat. 2000;5:251–253. - PubMed
    1. Kramer A, Zimmerman J. Assessing the calibration of mortality benchmarks in critical care: The hosmer–lemeshow test revisited. Crit Care Med. 2007;35:2052–2056. - PubMed