Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Jun 1;51(6):775-786.
doi: 10.1097/CCM.0000000000005837. Epub 2023 Mar 16.

External Validation and Comparison of a General Ward Deterioration Index Between Diversely Different Health Systems

Affiliations

External Validation and Comparison of a General Ward Deterioration Index Between Diversely Different Health Systems

Brandon C Cummings et al. Crit Care Med. .

Abstract

Objectives: Implementing a predictive analytic model in a new clinical environment is fraught with challenges. Dataset shifts such as differences in clinical practice, new data acquisition devices, or changes in the electronic health record (EHR) implementation mean that the input data seen by a model can differ significantly from the data it was trained on. Validating models at multiple institutions is therefore critical. Here, using retrospective data, we demonstrate how Predicting Intensive Care Transfers and other UnfoReseen Events (PICTURE), a deterioration index developed at a single academic medical center, generalizes to a second institution with significantly different patient population.

Design: PICTURE is a deterioration index designed for the general ward, which uses structured EHR data such as laboratory values and vital signs.

Setting: The general wards of two large hospitals, one an academic medical center and the other a community hospital.

Subjects: The model has previously been trained and validated on a cohort of 165,018 general ward encounters from a large academic medical center. Here, we apply this model to 11,083 encounters from a separate community hospital.

Interventions: None.

Measurements and main results: The hospitals were found to have significant differences in missingness rates (> 5% difference in 9/52 features), deterioration rate (4.5% vs 2.5%), and racial makeup (20% non-White vs 49% non-White). Despite these differences, PICTURE's performance was consistent (area under the receiver operating characteristic curve [AUROC], 0.870; 95% CI, 0.861-0.878), area under the precision-recall curve (AUPRC, 0.298; 95% CI, 0.275-0.320) at the first hospital; AUROC 0.875 (0.851-0.902), AUPRC 0.339 (0.281-0.398) at the second. AUPRC was standardized to a 2.5% event rate. PICTURE also outperformed both the Epic Deterioration Index and the National Early Warning Score at both institutions.

Conclusions: Important differences were observed between the two institutions, including data availability and demographic makeup. PICTURE was able to identify general ward patients at risk of deterioration at both hospitals with consistent performance (AUROC and AUPRC) and compared favorably to existing metrics.

PubMed Disclaimer

Conflict of interest statement

Mr. Cummings’s, Mr. Blackmer’s, Dr. Farzaneh’s, Ms. Cao’s, and Dr. Ansari’s institutions received funding from the Michigan Institute for Data Science and Airstrip Technologies. Mr. Cummings, Mr. Blackmer, Dr. Farzaneh, Ms. Cao, and Drs. Gillies, Medlin, Ward, and Ansari have disclosed that multiple patents have been filed for this work and invention disclosures have been submitted with the Office of Technology Transfer, University of Michigan, Ann Arbor, and that Airstrip Technologies has a license option for Predicting Intensive Care Transfers and other UnfoReseen Events from the University of Michigan. Mr. Motyka disclosed that he is currently employed by at Strata Oncology (Precision Oncology). Dr. Gillies disclosed that he is employed at Regeneron Pharmaceuticals. Dr. Admon’s institution received funding from the National Heart, Lung, and Blood Institute. Drs. Admon and Sjoding received support for article research from the National Institutes of Health (NIH). Dr. Singh’s institution received funding from Blue Cross Blue Shield of Michigan and Teva Pharmaceuticals; he received funding from Flatiron Health. Dr. Sjoding’s institution received funding from the NIH. The remaining authors have disclosed that they do not have any potential conflicts of interest.

Figures

Figure 1.
Figure 1.
Standardized mean difference (Cohen’s d) between the two institutions at encounter level. A, For each feature, the median value of each hospital encounter was first calculated to avoid biasing the calculations toward patients with more frequently drawn laboratory values, which may indicate sicker patients. The mean difference of these encounter level statistics was taken (Michigan Medicine [MM]–Hurley Medical Center [HMC]) and normalized to the pooled sd. B, Difference in encounter level missing rate between institutions (MM–HMC).
Figure 2.
Figure 2.
Lead time simulation. Area under the receiver operating characteristic curve (AUROC) and area under the precision-recall curve (AUPRC) were evaluated for Predicting Intensive Care Transfers and other UnfoReseen Events (PICTURE) and Epic Deterioration Index (EDI) by calculating the maximum prediction score prior to x hr before the deterioration event, with x ranging from 0.5 to 24 hr. Twenty-four hr was selected as the limit since, during model training, only observations less than 24 hr in advance of the deterioration were labeled as positive. AUPRC is again adjusted to the event rate of 2.5% to match the second hospital (Hurley Medical Center [HMC]). Error bars representing 95% CIs are reported using the 1,000-replicate bootstrap described previously. A, AUROC at hospital 1 (Michigan Medicine [MM]) for PICTURE and EDI scores. B, AUROC at hospital 2 (HMC). C, AUPRC at MM. D, AUPRC at HMC.
Figure 3.
Figure 3.
Positive predictive value (PPV), sensitivity, and specificity change with alert threshold. PPV, sensitivity, and specificity are plotted at varying thresholds. A candidate threshold was selected at a sensitivity of 0.5 using data from hospital 1 (A, dark dashed line), and then applied to data from hospital 2 (B, dark dashed line). Note that PPV at hospital 1 is adjusted to reflect the event rate at the second institution. A second candidate alert threshold (light dotted line) was chosen using the same procedure on data from hospital 2 to indicate the possible desirability of choosing separate thresholds to better fit clinical care in the different environments. HMC = Hurley Medical Center, MM = Michigan Medicine.

References

    1. Allen J, Currey J, Jones D, et al. : Development and validation of the medical emergency team-risk prediction model for clinical deterioration in acute hospital patients, at time of an emergency admission. Crit Care Med 2022; 50:1588–1598 - PubMed
    1. Saab A, Abi Khalil C, Jammal M, et al. : Early prediction of all-cause clinical deterioration in general wards patients: Development and validation of a biomarker-based machine learning model derived from rapid response team activations. J Patient Saf 2022; 18:578–586 - PubMed
    1. Reardon PM, Seely AJE, Fernando SM, et al. : Can early warning systems enhance detection of high risk patients by rapid response teams? J Intensive Care Med 2021; 36:542–549 - PubMed
    1. Fernandes M, Mendes R, Vieira SM, et al. : Predicting intensive care unit admission among patients presenting to the emergency department using machine learning and natural language processing. PLoS One 2020; 15:e0229331. - PMC - PubMed
    1. Churpek MM, Yuen TC, Winslow C, et al. : Multicenter comparison of machine learning methods and conventional regression for predicting clinical deterioration on the wards. Crit Care Med 2016; 44:368–374 - PMC - PubMed

Publication types