Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Jul;46(7):1125-1132.
doi: 10.1097/CCM.0000000000003148.

Inclusion of Unstructured Clinical Text Improves Early Prediction of Death or Prolonged ICU Stay

Affiliations

Inclusion of Unstructured Clinical Text Improves Early Prediction of Death or Prolonged ICU Stay

Gary E Weissman et al. Crit Care Med. 2018 Jul.

Abstract

Objectives: Early prediction of undesired outcomes among newly hospitalized patients could improve patient triage and prompt conversations about patients' goals of care. We evaluated the performance of logistic regression, gradient boosting machine, random forest, and elastic net regression models, with and without unstructured clinical text data, to predict a binary composite outcome of in-hospital death or ICU length of stay greater than or equal to 7 days using data from the first 48 hours of hospitalization.

Design: Retrospective cohort study with split sampling for model training and testing.

Setting: A single urban academic hospital.

Patients: All hospitalized patients who required ICU care at the Beth Israel Deaconess Medical Center in Boston, MA, from 2001 to 2012.

Interventions: None.

Measurements and main results: Among eligible 25,947 hospital admissions, we observed 5,504 (21.2%) in which patients died or had ICU length of stay greater than or equal to 7 days. The gradient boosting machine model had the highest discrimination without (area under the receiver operating characteristic curve, 0.83; 95% CI, 0.81-0.84) and with (area under the receiver operating characteristic curve, 0.89; 95% CI, 0.88-0.90) text-derived variables. Both gradient boosting machines and random forests outperformed logistic regression without text data (p < 0.001), whereas all models outperformed logistic regression with text data (p < 0.02). The inclusion of text data increased the discrimination of all four model types (p < 0.001). Among those models using text data, the increasing presence of terms "intubated" and "poor prognosis" were positively associated with mortality and ICU length of stay, whereas the term "extubated" was inversely associated with them.

Conclusions: Variables extracted from unstructured clinical text from the first 48 hours of hospital admission using natural language processing techniques significantly improved the abilities of logistic regression and other machine learning models to predict which patients died or had long ICU stays. Learning health systems may adapt such models using open-source approaches to capture local variation in care patterns.

PubMed Disclaimer

Conflict of interest statement

Conflicts of Interest

The remaining authors have disclosed that they do not have any conflicts of interest.

Figures

Figure 1
Figure 1
Study cohort and exclusions.
Figure 2
Figure 2
Receiver operating characteristic curve of models using only structured (A) and both structured and unstructured (B) data sources. Abbreviations: EN = elastic net, GB = gradient boosting machines, LR = logistic regression, RF = random forest.
Figure 3
Figure 3
Calibration plot of models using only structured (A) and both structured and unstructured (B) data sources. Abbreviations: EN = elastic net, GB = gradient boosting machines, LR = logistic regression, RF = random forest.

Comment in

References

    1. Barrett M, Smith M, Elixhauser A, et al. Technical Report #185. Agency for Healthcare Research and Quality; 2014. Utilization of intensive care services, 2011. - PubMed
    1. Elliott D, Davidson JE, Harvey MA, et al. Exploring the Scope of Post-Intensive Care Syndrome Therapy and Care: Engagement of Non-Critical Care Providers and Survivors in a Second Stakeholders Meeting. Crit Care Med. 2014;42:2518–2526. - PubMed
    1. Gabler NB, Ratcliffe SJ, Wagner J, et al. Mortality among patients admitted to strained intensive care units. Am J Resp Crit Care Med. 2013;188:800–806. - PMC - PubMed
    1. Wagner J, Gabler NB, Ratcliffe SJ, et al. Outcomes among patients discharged from busy intensive care units. Ann Intern Med. 2013;159:447–455. - PMC - PubMed
    1. Weissman GE, Gabler NB, Brown SE, et al. Intensive care unit capacity strain and adherence to prophylaxis guidelines. J Crit Care. 2015;30:1303–1309. - PMC - PubMed

Publication types