Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Nov 1:8:20552076221133692.
doi: 10.1177/20552076221133692. eCollection 2022 Jan-Dec.

Homogeneous ensemble models for predicting infection levels and mortality of COVID-19 patients: Evidence from China

Affiliations

Homogeneous ensemble models for predicting infection levels and mortality of COVID-19 patients: Evidence from China

Jiafeng Wang et al. Digit Health. .

Abstract

Background: Persistence of long-term COVID-19 pandemic is putting high pressure on healthcare services worldwide for several years. This article aims to establish models to predict infection levels and mortality of COVID-19 patients in China.

Methods: Machine learning models and deep learning models have been built based on the clinical features of COVID-19 patients. The best models are selected by area under the receiver operating characteristic curve (AUC) scores to construct two homogeneous ensemble models for predicting infection levels and mortality, respectively. The first-hand clinical data of 760 patients are collected from Zhongnan Hospital of Wuhan University between 3 January and 8 March 2020. We preprocess data with cleaning, imputation, and normalization.

Results: Our models obtain AUC = 0.7059 and Recall (Weighted avg) = 0.7248 in predicting infection level, while AUC=0.8436 and Recall (Weighted avg) = 0.8486 in predicting mortality ratio. This study also identifies two sets of essential clinical features. One is C-reactive protein (CRP) or high sensitivity C-reactive protein (hs-CRP) and the other is chest tightness, age, and pleural effusion.

Conclusions: Two homogeneous ensemble models are proposed to predict infection levels and mortality of COVID-19 patients in China. New findings of clinical features for benefiting the machine learning models are reported. The evaluation of an actual dataset collected from January 3 to March 8, 2020 demonstrates the effectiveness of the models by comparing them with state-of-the-art models in prediction.

Keywords: COVID-19; Ensemble model; electronic health records; machine learning; prediction models.

PubMed Disclaimer

Conflict of interest statement

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Figures

Figure 1.
Figure 1.
Workflow for data preprocessing and model development for predicting COVID-19 infection levels and mortality.
Figure 2.
Figure 2.
The architecture of ensemble model with soft voting.
Figure 3.
Figure 3.
The comparison of hard voting and soft voting. The result of hard voting is negative since more than half of base learners return negative results. However, the result of the soft voting is positive because the average of all base learners is greater than 0.5. Therefore, the soft voting could balance the weakness of those base learners that could not classify sample correctly.
Figure 4.
Figure 4.
The mean value of continuous features (upper and middle) and the proportion of binary features (lower) on infection levels outcome.
Figure 5.
Figure 5.
The mean value of continuous features (upper and middle) and the proportion of binary features (lower) on mortality outcome.
Figure 6.
Figure 6.
The area under the receiver operating characteristic curve (AUC) average scores of all models for each subset of number of manually selected features in predicting infection levels (upper) and predicting mortality (lower).
Figure 7.
Figure 7.
The area under the receiver operating characteristic curve (AUC) average scores of all models for each subset of number of features selected by models in predicting infection levels (upper) and predicting mortality (lower).
Figure 8.
Figure 8.
The features were selected by the homogeneous ensemble models. The weights of the features of predicting infection levels (left) and predicting mortality (right).
Figure 9.
Figure 9.
Confusion matrixes of infection levels prediction (left) and mortality prediction (right).
Figure 10.
Figure 10.
The area under the receiver operating characteristic curve (AUC) scores and weighted average recall of the models predicting infection levels (upper) and mortality (lower) with only the selected features by the homogeneous ensemble model.
Figure 11.
Figure 11.
The area under the receiver operating characteristic curve (AUC) scores and weighted average recall of the models on predicting infection levels (left) and mortality (right) with selecting features automatically by the homogeneous ensemble models and manually.

Similar articles

References

    1. Schwab P, Schütte AD, Dietz Bet al. et al. Clinical predictive models for COVID-19: systematic study. J Med Internet Res 2020; 22: e21439. - PMC - PubMed
    1. World Health Organization. COVID-19 weekly epidemiological update, edition 84, 22 March 2022. 2022.
    1. World Health Organization (WHO). Clinical management of severe acute respiratory infection when novel coronavirus (nCoV) infection is suspected: interim guidance; WHO: Geneva, Switzerland; Available online: https://apps.who.int/iris/handle/10665/332299. Published 12 January 2020.
    1. Lalmuanawma S, Hussain J, Chhakchhuak L. Applications of machine learning and artificial intelligence for COVID-19 (SARS-CoV-2) pandemic: a review. Chaos Solitons Fractals 2020; 139: 110059. - PMC - PubMed
    1. Yan L, Zhang HT, Goncalves Jet al. An interpretable mortality prediction model for COVID-19 patients. Nat Mach Intell 2020; 2: 283–288.