. 2018 Jul;46(7):1125-1132.

doi: 10.1097/CCM.0000000000003148.

Inclusion of Unstructured Clinical Text Improves Early Prediction of Death or Prolonged ICU Stay

Gary E Weissman^{1

2

3}, Rebecca A Hubbard⁴, Lyle H Ungar⁵, Michael O Harhay^{2

4}, Casey S Greene^{6

7

8}, Blanca E Himes^{4

8}, Scott D Halpern^{1

2

3

4}

Affiliations

¹ Division of Pulmonary, Allergy, and Critical Care, Department of Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA.
² Palliative and Advanced Illness Research Center, Department of Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA.
³ Leonard Davis Institute of Health Economics, University of Pennsylvania, Philadelphia, PA.
⁴ Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA.
⁵ Department of Computer and Information Science, University of Pennsylvania, Philadelphia, PA.
⁶ Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, PA.
⁷ Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA.
⁸ Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA.

PMID: 29629986
PMCID: PMC6005735
DOI: 10.1097/CCM.0000000000003148

Inclusion of Unstructured Clinical Text Improves Early Prediction of Death or Prolonged ICU Stay

Gary E Weissman et al. Crit Care Med. 2018 Jul.

. 2018 Jul;46(7):1125-1132.

doi: 10.1097/CCM.0000000000003148.

Authors

Gary E Weissman^{1

2

3}, Rebecca A Hubbard⁴, Lyle H Ungar⁵, Michael O Harhay^{2

4}, Casey S Greene^{6

7

8}, Blanca E Himes^{4

8}, Scott D Halpern^{1

2

3

4}

Affiliations

¹ Division of Pulmonary, Allergy, and Critical Care, Department of Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA.
² Palliative and Advanced Illness Research Center, Department of Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA.
³ Leonard Davis Institute of Health Economics, University of Pennsylvania, Philadelphia, PA.
⁴ Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA.
⁵ Department of Computer and Information Science, University of Pennsylvania, Philadelphia, PA.
⁶ Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, PA.
⁷ Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA.
⁸ Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA.

PMID: 29629986
PMCID: PMC6005735
DOI: 10.1097/CCM.0000000000003148

Abstract

Objectives: Early prediction of undesired outcomes among newly hospitalized patients could improve patient triage and prompt conversations about patients' goals of care. We evaluated the performance of logistic regression, gradient boosting machine, random forest, and elastic net regression models, with and without unstructured clinical text data, to predict a binary composite outcome of in-hospital death or ICU length of stay greater than or equal to 7 days using data from the first 48 hours of hospitalization.

Design: Retrospective cohort study with split sampling for model training and testing.

Setting: A single urban academic hospital.

Patients: All hospitalized patients who required ICU care at the Beth Israel Deaconess Medical Center in Boston, MA, from 2001 to 2012.

Interventions: None.

Measurements and main results: Among eligible 25,947 hospital admissions, we observed 5,504 (21.2%) in which patients died or had ICU length of stay greater than or equal to 7 days. The gradient boosting machine model had the highest discrimination without (area under the receiver operating characteristic curve, 0.83; 95% CI, 0.81-0.84) and with (area under the receiver operating characteristic curve, 0.89; 95% CI, 0.88-0.90) text-derived variables. Both gradient boosting machines and random forests outperformed logistic regression without text data (p < 0.001), whereas all models outperformed logistic regression with text data (p < 0.02). The inclusion of text data increased the discrimination of all four model types (p < 0.001). Among those models using text data, the increasing presence of terms "intubated" and "poor prognosis" were positively associated with mortality and ICU length of stay, whereas the term "extubated" was inversely associated with them.

Conclusions: Variables extracted from unstructured clinical text from the first 48 hours of hospital admission using natural language processing techniques significantly improved the abilities of logistic regression and other machine learning models to predict which patients died or had long ICU stays. Learning health systems may adapt such models using open-source approaches to capture local variation in care patterns.

PubMed Disclaimer

Conflict of interest statement

Conflicts of Interest

The remaining authors have disclosed that they do not have any conflicts of interest.

Figures

**Figure 1**
Study cohort and exclusions.

**Figure 2**
Receiver operating characteristic curve of models using only structured (A) and both structured and unstructured (B) data sources. Abbreviations: EN = elastic net, GB = gradient boosting machines, LR = logistic regression, RF = random forest.

**Figure 3**
Calibration plot of models using only structured (A) and both structured and unstructured (B) data sources. Abbreviations: EN = elastic net, GB = gradient boosting machines, LR = logistic regression, RF = random forest.

See this image and copyright information in PMC

Comment in

Toward the "Plateau of Productivity": Enhancing the Value of Machine Learning in Critical Care.
Liu VX. Liu VX. Crit Care Med. 2018 Jul;46(7):1196-1197. doi: 10.1097/CCM.0000000000003170. Crit Care Med. 2018. PMID: 29912104 Free PMC article. No abstract available.

References

1. Barrett M, Smith M, Elixhauser A, et al. Technical Report #185. Agency for Healthcare Research and Quality; 2014. Utilization of intensive care services, 2011. - PubMed
1. Elliott D, Davidson JE, Harvey MA, et al. Exploring the Scope of Post-Intensive Care Syndrome Therapy and Care: Engagement of Non-Critical Care Providers and Survivors in a Second Stakeholders Meeting. Crit Care Med. 2014;42:2518–2526. - PubMed
1. Gabler NB, Ratcliffe SJ, Wagner J, et al. Mortality among patients admitted to strained intensive care units. Am J Resp Crit Care Med. 2013;188:800–806. - PMC - PubMed
1. Wagner J, Gabler NB, Ratcliffe SJ, et al. Outcomes among patients discharged from busy intensive care units. Ann Intern Med. 2013;159:447–455. - PMC - PubMed
1. Weissman GE, Gabler NB, Brown SE, et al. Intensive care unit capacity strain and adherence to prophylaxis guidelines. J Crit Care. 2015;30:1303–1309. - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Inclusion of Unstructured Clinical Text Improves Early Prediction of Death or Prolonged ICU Stay

Affiliations

Inclusion of Unstructured Clinical Text Improves Early Prediction of Death or Prolonged ICU Stay

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Comment in

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources