. 2022 Jan 29;29(3):559-575.

doi: 10.1093/jamia/ocab236.

Sepsis prediction, early detection, and identification using clinical text for machine learning: a systematic review

Melissa Y Yan¹, Lise Tuset Gustad^{2

3}, Øystein Nytrø¹

Affiliations

¹ Department of Computer Science, Faculty of Information Technology and Electrical Engineering, Norwegian University of Science and Technology, Trondheim, Norway.
² Department of Circulation and Medical Imaging, Faculty of Medicine and Health Sciences, Norwegian University of Science and Technology, Trondheim, Norway.
³ Department of Medicine, Levanger Hospital, Clinic of Medicine and Rehabilitation, Nord-Trøndelag Hospital Trust, Levanger, Norway.

PMID: 34897469
PMCID: PMC8800516
DOI: 10.1093/jamia/ocab236

Sepsis prediction, early detection, and identification using clinical text for machine learning: a systematic review

Melissa Y Yan et al. J Am Med Inform Assoc. 2022.

. 2022 Jan 29;29(3):559-575.

doi: 10.1093/jamia/ocab236.

Authors

Melissa Y Yan¹, Lise Tuset Gustad^{2

3}, Øystein Nytrø¹

Affiliations

¹ Department of Computer Science, Faculty of Information Technology and Electrical Engineering, Norwegian University of Science and Technology, Trondheim, Norway.
² Department of Circulation and Medical Imaging, Faculty of Medicine and Health Sciences, Norwegian University of Science and Technology, Trondheim, Norway.
³ Department of Medicine, Levanger Hospital, Clinic of Medicine and Rehabilitation, Nord-Trøndelag Hospital Trust, Levanger, Norway.

PMID: 34897469
PMCID: PMC8800516
DOI: 10.1093/jamia/ocab236

Abstract

Objective: To determine the effects of using unstructured clinical text in machine learning (ML) for prediction, early detection, and identification of sepsis.

Materials and methods: PubMed, Scopus, ACM DL, dblp, and IEEE Xplore databases were searched. Articles utilizing clinical text for ML or natural language processing (NLP) to detect, identify, recognize, diagnose, or predict the onset, development, progress, or prognosis of systemic inflammatory response syndrome, sepsis, severe sepsis, or septic shock were included. Sepsis definition, dataset, types of data, ML models, NLP techniques, and evaluation metrics were extracted.

Results: The clinical text used in models include narrative notes written by nurses, physicians, and specialists in varying situations. This is often combined with common structured data such as demographics, vital signs, laboratory data, and medications. Area under the receiver operating characteristic curve (AUC) comparison of ML methods showed that utilizing both text and structured data predicts sepsis earlier and more accurately than structured data alone. No meta-analysis was performed because of incomparable measurements among the 9 included studies.

Discussion: Studies focused on sepsis identification or early detection before onset; no studies used patient histories beyond the current episode of care to predict sepsis. Sepsis definition affects reporting methods, outcomes, and results. Many methods rely on continuous vital sign measurements in intensive care, making them not easily transferable to general ward units.

Conclusions: Approaches were heterogeneous, but studies showed that utilizing both unstructured text and structured data in ML can improve identification and early detection of sepsis.

Keywords: electronic health records; machine learning; natural language processing; sepsis; systematic review.

PubMed Disclaimer

Figures

**Figure 1.**
PRISMA (Preferred Reporting Items for Systemic reviews and Meta-Analyses) flowchart for study selection.

**Figure 2.**
Overview of data from a patient timeline used to create models. The proximity of events toward a patient’s actual state and the actual documentation recorded in the electronic health records typically has delays. Green represents patient states as sepsis develops in a patient. Yellow are observations made by clinicians. Documentation includes ICU vital signs^a in pink, narrative notes in blue, and ICD codes in orange. ICU vital sign^a documentation can be instantaneous, narrative notes can be written after observations are made, and ICD codes are typically registered after a patient is discharged. PIVC: peripheral intravenous catheter. ^aVital signs include temperature, pulse, blood pressure, respiratory rate, oxygen saturation, and level of consciousness and awareness.

**Figure 3.**
Different types of windows were used to obtain longitudinal data. Each gray box represents a single window, which can vary in duration (length of time) depending on the study. One window with the whole encounter means the study used a single window containing data with a duration of the whole encounter from admittance until discharge. One window before onset signifies data from a window with a duration of time before sepsis, severe sepsis, or septic shock onset. Sliding windows are consecutive windows until before sepsis, severe sepsis, or septic shock onset; this includes non-overlapping and overlapping sliding windows. Non-overlapping sliding windows indicate that data within one window of a fixed duration does not contain data in the next window. In contrast, overlapping sliding windows indicate windows of a fixed duration overlap, and data within one window will be partially in the next window.

**Figure 4.**
The unit of analysis used to train machine learning models for the included studies was either (1) a single note, (2) a set of many notes, or (3) keywords. In general, text was preprocessed and represented as features interpretable by a computer, then structured data were added, and the data were used to fit machine learning models.

**Figure 5.**
Overview of area under the curve (AUC) values for identification or early detection of infection, sepsis, septic shock, and severe sepsis using different data types (structured data and text, structured data only, and text only).^∗ Each figure contains the study and year, machine learning model,^a and natural language processing technique^b. (A) AUC values for infection identification. Horng et al 2017: SVM (BoW) has 2 AUC values; 0.86 when using chief complaints and nursing notes and 0.83 when using only chief complaints. (B) AUC values for early sepsis detection. Amrollahi et al AUC values are from detecting 4 h before sepsis onset, and Qin et al AUC values are the average from detecting 0 to 6 h before sepsis onset. (C) AUC values for early septic shock detection. Hammoud et al AUC values are from detecting 30.64 h before septic shock onset, and Liu et al AUC values are from detecting 6.0 to 7.3 h before septic shock onset. (D) AUC values for early sepsis, severe sepsis, or septic shock detection and sepsis identification in Goh et al. Different symbols separate data types. (E) AUC values for early septic shock detection for Culliton et al using results from the test set. (F) AUC values for early septic shock detection for Culliton et al using results from 3-fold validation. ^∗Disclaimer: AUC values should not be directly compared between studies and different figures for infection, sepsis, severe sepsis, and septic shock. Additionally, the lines connecting points do not indicate AUC values changing over time (Figure 5D and 5F); lines only separate the different methods visually. ^aMachine learning models: dag: dagging (partition data into disjoint subgroups); GBT: gradient boosted trees; GRU: gated recurrent unit; LSTM: long short-term memory; NB: Naïve Bayes; RF: random forest; SVM: support vector machines. ^bNatural language processing techniques: BoW: Bag-of-words; ClinicalBERT: Clinical Bidirectional Encoder Representations from Transformers; ClinicalBERT-m: ClinicalBERT from merging all textual features to get embeddings; ClinicalBERT-sf; finetuned ClinicalBERT from concatenating individual embeddings of each textual feature; CM: Amazon Comprehend Medical service for named entity recognition; GloVe: Global Vectors for Word Representation; LDA: Latent Dirichlet Allocation; tf-idf: term frequency-inverse document frequency.

See this image and copyright information in PMC

References

1. Singer M, Deutschman CS, Seymour CW, et al. The third international consensus definitions for sepsis and septic shock (Sepsis-3). JAMA 2016; 315 (8): 801–10. - PMC - PubMed
1. Fleischmann C, Scherag A, Adhikari NKJ, et al. ; International Forum of Acute Care Trialists. Assessment of global incidence and mortality of hospital-treated sepsis. Current estimates and limitations. Am J Respir Crit Care Med 2016; 193 (3): 259–72. - PubMed
1. Rivers E, Nguyen B, Havstad S, et al. Early goal-directed therapy in the treatment of severe sepsis and septic shock. N Engl J Med 2001; 345 (19): 1368–77. - PubMed
1. Kumar A, Roberts D, Wood KE, et al. Duration of hypotension before initiation of effective antimicrobial therapy is the critical determinant of survival in human septic shock. Crit Care Med 2006; 34: 1589–96. - PubMed
1. Polat G, Ugan RA, Cadirci E, et al. Sepsis and septic shock: current treatment strategies and new approaches. Eurasian J Med 2017; 49 (1): 53–8. - PMC - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Medical
- MedlinePlus Health Information

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Sepsis prediction, early detection, and identification using clinical text for machine learning: a systematic review

Affiliations

Sepsis prediction, early detection, and identification using clinical text for machine learning: a systematic review

Authors

Affiliations

Abstract

Figures

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Medical