Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Feb 18;11(1):4200.
doi: 10.1038/s41598-021-83784-y.

Early risk assessment for COVID-19 patients from emergency department data using machine learning

Affiliations

Early risk assessment for COVID-19 patients from emergency department data using machine learning

Frank S Heldt et al. Sci Rep. .

Abstract

Since its emergence in late 2019, the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has caused a pandemic with more than 55 million reported cases and 1.3 million estimated deaths worldwide. While epidemiological and clinical characteristics of COVID-19 have been reported, risk factors underlying the transition from mild to severe disease among patients remain poorly understood. In this retrospective study, we analysed data of 879 confirmed SARS-CoV-2 positive patients admitted to a two-site NHS Trust hospital in London, England, between January 1st and May 26th, 2020, with a majority of cases occurring in March and April. We extracted anonymised demographic data, physiological clinical variables and laboratory results from electronic healthcare records (EHR) and applied multivariate logistic regression, random forest and extreme gradient boosted trees. To evaluate the potential for early risk assessment, we used data available during patients' initial presentation at the emergency department (ED) to predict deterioration to one of three clinical endpoints in the remainder of the hospital stay: admission to intensive care, need for invasive mechanical ventilation and in-hospital mortality. Based on the trained models, we extracted the most informative clinical features in determining these patient trajectories. Considering our inclusion criteria, we have identified 129 of 879 (15%) patients that required intensive care, 62 of 878 (7%) patients needing mechanical ventilation, and 193 of 619 (31%) cases of in-hospital mortality. Our models learned successfully from early clinical data and predicted clinical endpoints with high accuracy, the best model achieving area under the receiver operating characteristic (AUC-ROC) scores of 0.76 to 0.87 (F1 scores of 0.42-0.60). Younger patient age was associated with an increased risk of receiving intensive care and ventilation, but lower risk of mortality. Clinical indicators of a patient's oxygen supply and selected laboratory results, such as blood lactate and creatinine levels, were most predictive of COVID-19 patient trajectories. Among COVID-19 patients machine learning can aid in the early identification of those with a poor prognosis, using EHR data collected during a patient's first presentation at ED. Patient age and measures of oxygenation status during ED stay are primary indicators of poor patient outcomes.

PubMed Disclaimer

Conflict of interest statement

FSH, MPV, SP, MC, LML, AM and RTK have a patent “Methods for predicting patient deterioration” based on this work pending. FSH, SP, MC, LML, FA, SJ, RD, NL RAF, AH, RL, LM, LT and RTK are employees of Sensyne Health plc (part-time in case of LM and LT). MPV, AM, RAP, AB and JE are employees of Chelsea and Westminster Hospital NHS Foundation trust. LT reported receiving additional fees from the National Institute for Health Research and the Stroke Association Grants (RP-PG-1214-20003; IS-BRC-1215-20008; RP-PG-0614-20005; TSA BHF 2017/01), and LM further funded by The National Institute for Health Research Grant (IS-BRC-1215-20008). LM and LT are further supported by the NIHR Oxford Biomedical Research Centre.

Figures

Figure 1
Figure 1
Patient pathways and outcome prediction. (A) Patient transitions between hospital departments are shown as bands proportional in size to patient numbers. Different departments are indicted by rectangles (ED, emergency department; Ward, regular hospital ward; AICU, adult intensive care unit). Patients who remain in hospital, are being discharged or die in hospital are indicated on the right. (B) Patient outcome prediction models use clinical data recorded within the ED stay of a patient to predict clinical endpoints during the remainder of the in-hospital stay.
Figure 2
Figure 2
Prediction performance for AICU admission. Model performances for the logistic regression, random forest and XGBoost models are shown as ROC (A) and precision-recall curves (B). AUC is provided in brackets. Solid lines and shaded areas indicate the mean and standard deviation across three cross-validation folds, respectively. Dashed lines indicate random classifiers.
Figure 3
Figure 3
Feature importance for AICU admission. (A–C) Permutation feature importance for the logistic regression (A), random forest (B) and XGBoost (C) models. Only the top 15 features are shown. Asterisks mark features with importance scores significantly different from zero across three cross-validation folds with t-test p value thresholds of 5% ( ∗) and 1% (∗ ∗). (D–F) Accumulated local effects plots for the logistic regression (D), random forest (E) and XGBoost models (F). The top two features according to permutation feature importance are shown for each model. Vertical bars at the bottom indicate feature values observed in the data set.
Figure 4
Figure 4
Prediction performance for mechanical ventilation. Model performances for the logistic regression, random forest and XGBoost models are shown as ROC (A) and precision-recall curves (B). AUC is provided in brackets. Solid lines and shaded areas indicate the mean and standard deviation across three cross-validation folds, respectively. Dashed lines indicate random classifiers.
Figure 5
Figure 5
Feature importance for mechanical ventilation. Permutation feature importance for the random forest (A), logistic regression (B) and XGBoost (C) models. Only the top 15 features are shown. Asterisks mark features with importance scores significantly different from zero across three cross-validation folds with t-test p value thresholds of 5% ( ∗) and 1% (∗ ∗). (D–F) Accumulated local effects plots for the logistic regression (D), random forest (E) and XGBoost models (F). The top two features according to permutation feature importance are shown for each model. Vertical bars at the bottom indicate feature values observed in the data set.
Figure 6
Figure 6
Prediction performance for mortality. Model performances for the logistic regression, random forest and XGBoost models are shown as ROC (A) and precision-recall curves (B). AUC is provided in brackets. Solid lines and shaded areas indicate the mean and standard deviation across three cross-validation folds, respectively. Dashed lines indicate random classifiers.
Figure 7
Figure 7
Feature importance for mortality. (A–C) Permutation feature importance for the logistic regression (A), random forest (B) and XGBoost (C) models. Only the top 15 features are shown. Asterisks mark features with importance scores significantly different from zero across three cross-validation folds with t-test p value thresholds of 5% ( ∗) and 1% (∗ ∗). (D–F) Accumulated local effects plots for the logistic regression (D), random forest (E) and XGBoost models (F). The top two features according to permutation feature importance are shown for each model. Vertical bars at the bottom indicate feature values observed in the data set.

References

    1. Wu Z, McGoogan JM. Characteristics of and important lessons from the Coronavirus Disease 2019 (COVID-19) outbreak in China: summary of a report of 72 314 cases from the Chinese Center for Disease Control and Prevention. JAMA. 2020;323:1239–1242. doi: 10.1001/jama.2020.2648. - DOI - PubMed
    1. Yang X, et al. Clinical course and outcomes of critically ill patients with SARS-CoV-2 pneumonia in Wuhan, China: a single-centered, retrospective, observational study. Lancet Respir. Med. 2020;9:475–481. doi: 10.1016/S2213-2600(20)30079-5. - DOI - PMC - PubMed
    1. Klok FA, et al. Incidence of thrombotic complications in critically ill ICU patients with COVID-19. Thromb. Res. 2020;191:145–147. doi: 10.1016/j.thromres.2020.04.013. - DOI - PMC - PubMed
    1. Anderson RM, Heesterbeek H, Klinkenberg D, Hollingsworth TD. How will country-based mitigation measures influence the course of the COVID-19 epidemic? Lancet. 2020;395:931–934. doi: 10.1016/S0140-6736(20)30567-5. - DOI - PMC - PubMed
    1. Vizcaychipi MP, et al. Early detection of severe COVID-19 disease patterns define near real-time personalised care, bioseverity in males, and decelerating mortality rates. medRxiv. 2020;22:2413.

Publication types

MeSH terms