Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Jun;4(6):e415-e425.
doi: 10.1016/S2589-7500(22)00049-8. Epub 2022 Apr 21.

Recurrent neural network models (CovRNN) for predicting outcomes of patients with COVID-19 on admission to hospital: model development and validation using electronic health record data

Affiliations

Recurrent neural network models (CovRNN) for predicting outcomes of patients with COVID-19 on admission to hospital: model development and validation using electronic health record data

Laila Rasmy et al. Lancet Digit Health. 2022 Jun.

Abstract

Background: Predicting outcomes of patients with COVID-19 at an early stage is crucial for optimised clinical care and resource management, especially during a pandemic. Although multiple machine learning models have been proposed to address this issue, because of their requirements for extensive data preprocessing and feature engineering, they have not been validated or implemented outside of their original study site. Therefore, we aimed to develop accurate and transferrable predictive models of outcomes on hospital admission for patients with COVID-19.

Methods: In this study, we developed recurrent neural network-based models (CovRNN) to predict the outcomes of patients with COVID-19 by use of available electronic health record data on admission to hospital, without the need for specific feature selection or missing data imputation. CovRNN was designed to predict three outcomes: in-hospital mortality, need for mechanical ventilation, and prolonged hospital stay (>7 days). For in-hospital mortality and mechanical ventilation, CovRNN produced time-to-event risk scores (survival prediction; evaluated by the concordance index) and all-time risk scores (binary prediction; area under the receiver operating characteristic curve [AUROC] was the main metric); we only trained a binary classification model for prolonged hospital stay. For binary classification tasks, we compared CovRNN against traditional machine learning algorithms: logistic regression and light gradient boost machine. Our models were trained and validated on the heterogeneous, deidentified data of 247 960 patients with COVID-19 from 87 US health-care systems derived from the Cerner Real-World COVID-19 Q3 Dataset up to September 2020. We held out the data of 4175 patients from two hospitals for external validation. The remaining 243 785 patients from the 85 health systems were grouped into training (n=170 626), validation (n=24 378), and multi-hospital test (n=48 781) sets. Model performance was evaluated in the multi-hospital test set. The transferability of CovRNN was externally validated by use of deidentified data from 36 140 patients derived from the US-based Optum deidentified COVID-19 electronic health record dataset (version 1015; from January, 2007, to Oct 15, 2020). Exact dates of data extraction were masked by the databases to ensure patient data safety.

Findings: CovRNN binary models achieved AUROCs of 93·0% (95% CI 92·6-93·4) for the prediction of in-hospital mortality, 92·9% (92·6-93·2) for the prediction of mechanical ventilation, and 86·5% (86·2-86·9) for the prediction of a prolonged hospital stay, outperforming light gradient boost machine and logistic regression algorithms. External validation confirmed AUROCs in similar ranges (91·3-97·0% for in-hospital mortality prediction, 91·5-96·0% for the prediction of mechanical ventilation, and 81·0-88·3% for the prediction of prolonged hospital stay). For survival prediction, CovRNN achieved a concordance index of 86·0% (95% CI 85·1-86·9) for in-hospital mortality and 92·6% (92·2-93·0) for mechanical ventilation.

Interpretation: Trained on a large, heterogeneous, real-world dataset, our CovRNN models showed high prediction accuracy and transferability through consistently good performances on multiple external datasets. Our results show the feasibility of a COVID-19 predictive model that delivers high accuracy without the need for complex feature engineering.

Funding: Cancer Prevention and Research Institute of Texas.

PubMed Disclaimer

Conflict of interest statement

Declaration of interests We declare no competing interests.

Figures

Figure 1
Figure 1
CovRNN prediction tasks Visit i represents the index visit. Visit i–1 represents the visit before the index visit.
Figure 2
Figure 2
Model development and external validation datasets CRWD=Cerner Real-World COVID-19 Q3 Dataset. OPTUM=Optum deidentified COVID-19 electronic health record dataset.
Figure 3
Figure 3
Kaplan-Meier curves in the stratified survival analysis In-hospital mortality (A) and mechanical ventilation (B) in the multi-hospital test set of the Cerner Real-World COVID-19 Q3 Dataset. In-hospital mortality (C) and mechanical ventilation (D) in the test set of the Optum deidentified COVID-19 electronic health record dataset. Stratification of patients is according to their predicted survival score over time in days since admission. Shaded areas indicate 95% CIs calculated on the logarithmic scale from the SEs of the Kaplan–Meier estimator with the centre values corresponding to the Kaplan–Meier estimate.
Figure 4
Figure 4
Subgroup analysis using the CRWD multi-hospital test set (A) Age group. (B) Comorbidity. (C) US census region. (D) Race. AUROC=area under the receiver operating characteristic curve. CRWD=Cerner Real-World COVID-19 Q3 Dataset.
Figure 5
Figure 5
Calibration plots for the CRWD validation set, CRWD multi-hospital test set, and OPTUM test set (A) In-hospital mortality. (B) Mechanical ventilation. (C) Prolonged hospital stay. CRWD=Cerner Real-World COVID-19 Q3 Dataset. OPTUM=Optum deidentified COVID-19 electronic health record dataset.

References

    1. WHO Coronavirus disease (COVID-19) pandemic. https://www.who.int/emergencies/diseases/novel-coronavirus-2019
    1. Centers for Disease Control and Prevention COVID data tracker. March 28, 2020. https://covid.cdc.gov/covid-data-tracker
    1. Wynants L, Van Calster B, Collins GS, et al. Prediction models for diagnosis and prognosis of covid-19: systematic review and critical appraisal. BMJ. 2020;369 - PMC - PubMed
    1. Sperrin M, Grant SW, Peek N. Prediction models for diagnosis and prognosis in Covid-19. BMJ. 2020;369 - PubMed
    1. Leeuwenberg AM, Schuit E. Prediction models for COVID-19 clinical decision making. Lancet Digit Health. 2020;2:e496–e497. - PMC - PubMed

Publication types