. 2022 Nov 1:8:20552076221133692.

doi: 10.1177/20552076221133692. eCollection 2022 Jan-Dec.

Homogeneous ensemble models for predicting infection levels and mortality of COVID-19 patients: Evidence from China

Jiafeng Wang¹, Xianlong Zhou^{2

3}, Zhitian Hou⁴, Xiaoya Xu⁵, Yueyue Zhao^{6

7}, Shanshan Chen^{6

7}, Jun Zhang⁸, Lina Shao⁹, Rong Yan⁶, Mingshan Wang⁷, Minghua Ge¹, Tianyong Hao⁴, Yuexing Tu¹⁰, Haijun Huang⁶

Affiliations

¹ Department of Head, Neck and Thyroid Surgery, Zhejiang Provincial People's Hospital and People's Hospital Affiliated to Hangzhou Medical College, Hangzhou, China.
² Emergency Center, Zhongnan Hospital of Wuhan University, Wuhan, China.
³ Hubei Clinical Research Center for Emergency and Resuscitation, Zhongnan Hospital of Wuhan University, Wuhan, China.
⁴ School of Computer Science, South China Normal University, Guangzhou, China.
⁵ School of Business Administration, Guangdong University of Finance & Economics, Guangzhou, China.
⁶ Department of Infectious Disease, Zhejiang Provincial People's Hospital and People's Hospital Affiliated to Hangzhou Medical College, Hangzhou, China.
⁷ Graduate School of Clinical Medicine, Bengbu Medical College, Bengbu, China.
⁸ Department of Orthopaedic Surgery, Zhejiang Provincial People's Hospital and People's Hospital Affiliated to Hangzhou Medical College, Hangzhou, China.
⁹ Department of Nephrology, Zhejiang Provincial People's Hospital and People's Hospital Affiliated of Hangzhou Medical College, Hangzhou, China.
¹⁰ Department of Intensive Care Unit, Zhejiang Provincial People's Hospital and People's Hospital Affiliated to Hangzhou Medical College, Hangzhou, China.

PMID: 36339905
PMCID: PMC9630904
DOI: 10.1177/20552076221133692

Homogeneous ensemble models for predicting infection levels and mortality of COVID-19 patients: Evidence from China

Jiafeng Wang et al. Digit Health. 2022.

. 2022 Nov 1:8:20552076221133692.

doi: 10.1177/20552076221133692. eCollection 2022 Jan-Dec.

Authors

Affiliations

¹ Department of Head, Neck and Thyroid Surgery, Zhejiang Provincial People's Hospital and People's Hospital Affiliated to Hangzhou Medical College, Hangzhou, China.
² Emergency Center, Zhongnan Hospital of Wuhan University, Wuhan, China.
³ Hubei Clinical Research Center for Emergency and Resuscitation, Zhongnan Hospital of Wuhan University, Wuhan, China.
⁴ School of Computer Science, South China Normal University, Guangzhou, China.
⁵ School of Business Administration, Guangdong University of Finance & Economics, Guangzhou, China.
⁶ Department of Infectious Disease, Zhejiang Provincial People's Hospital and People's Hospital Affiliated to Hangzhou Medical College, Hangzhou, China.
⁷ Graduate School of Clinical Medicine, Bengbu Medical College, Bengbu, China.
⁸ Department of Orthopaedic Surgery, Zhejiang Provincial People's Hospital and People's Hospital Affiliated to Hangzhou Medical College, Hangzhou, China.
⁹ Department of Nephrology, Zhejiang Provincial People's Hospital and People's Hospital Affiliated of Hangzhou Medical College, Hangzhou, China.
¹⁰ Department of Intensive Care Unit, Zhejiang Provincial People's Hospital and People's Hospital Affiliated to Hangzhou Medical College, Hangzhou, China.

PMID: 36339905
PMCID: PMC9630904
DOI: 10.1177/20552076221133692

Abstract

Background: Persistence of long-term COVID-19 pandemic is putting high pressure on healthcare services worldwide for several years. This article aims to establish models to predict infection levels and mortality of COVID-19 patients in China.

Methods: Machine learning models and deep learning models have been built based on the clinical features of COVID-19 patients. The best models are selected by area under the receiver operating characteristic curve (AUC) scores to construct two homogeneous ensemble models for predicting infection levels and mortality, respectively. The first-hand clinical data of 760 patients are collected from Zhongnan Hospital of Wuhan University between 3 January and 8 March 2020. We preprocess data with cleaning, imputation, and normalization.

Results: Our models obtain AUC = 0.7059 and Recall (Weighted avg) = 0.7248 in predicting infection level, while AUC=0.8436 and Recall (Weighted avg) = 0.8486 in predicting mortality ratio. This study also identifies two sets of essential clinical features. One is C-reactive protein (CRP) or high sensitivity C-reactive protein (hs-CRP) and the other is chest tightness, age, and pleural effusion.

Conclusions: Two homogeneous ensemble models are proposed to predict infection levels and mortality of COVID-19 patients in China. New findings of clinical features for benefiting the machine learning models are reported. The evaluation of an actual dataset collected from January 3 to March 8, 2020 demonstrates the effectiveness of the models by comparing them with state-of-the-art models in prediction.

Keywords: COVID-19; Ensemble model; electronic health records; machine learning; prediction models.

PubMed Disclaimer

Conflict of interest statement

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Figures

**Figure 1.**
Workflow for data preprocessing and model development for predicting COVID-19 infection levels and mortality.

**Figure 2.**
The architecture of ensemble model with soft voting.

**Figure 3.**
The comparison of hard voting and soft voting. The result of hard voting is negative since more than half of base learners return negative results. However, the result of the soft voting is positive because the average of all base learners is greater than 0.5. Therefore, the soft voting could balance the weakness of those base learners that could not classify sample correctly.

**Figure 4.**
The mean value of continuous features (upper and middle) and the proportion of binary features (lower) on infection levels outcome.

**Figure 5.**
The mean value of continuous features (upper and middle) and the proportion of binary features (lower) on mortality outcome.

**Figure 6.**
The area under the receiver operating characteristic curve (AUC) average scores of all models for each subset of number of manually selected features in predicting infection levels (upper) and predicting mortality (lower).

**Figure 7.**
The area under the receiver operating characteristic curve (AUC) average scores of all models for each subset of number of features selected by models in predicting infection levels (upper) and predicting mortality (lower).

**Figure 8.**
The features were selected by the homogeneous ensemble models. The weights of the features of predicting infection levels (left) and predicting mortality (right).

**Figure 9.**
Confusion matrixes of infection levels prediction (left) and mortality prediction (right).

**Figure 10.**
The area under the receiver operating characteristic curve (AUC) scores and weighted average recall of the models predicting infection levels (upper) and mortality (lower) with only the selected features by the homogeneous ensemble model.

**Figure 11.**
The area under the receiver operating characteristic curve (AUC) scores and weighted average recall of the models on predicting infection levels (left) and mortality (right) with selecting features automatically by the homogeneous ensemble models and manually.

See this image and copyright information in PMC

References

1. Schwab P, Schütte AD, Dietz Bet al. et al. Clinical predictive models for COVID-19: systematic study. J Med Internet Res 2020; 22: e21439. - PMC - PubMed
1. World Health Organization. COVID-19 weekly epidemiological update, edition 84, 22 March 2022. 2022.
1. World Health Organization (WHO). Clinical management of severe acute respiratory infection when novel coronavirus (nCoV) infection is suspected: interim guidance; WHO: Geneva, Switzerland; Available online: https://apps.who.int/iris/handle/10665/332299. Published 12 January 2020.
1. Lalmuanawma S, Hussain J, Chhakchhuak L. Applications of machine learning and artificial intelligence for COVID-19 (SARS-CoV-2) pandemic: a review. Chaos Solitons Fractals 2020; 139: 110059. - PMC - PubMed
1. Yan L, Zhang HT, Goncalves Jet al. An interpretable mortality prediction model for COVID-19 patients. Nat Mach Intell 2020; 2: 283–288.

LinkOut - more resources

Full Text Sources
Research Materials
- NCI CPTC Antibody Characterization Program
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Homogeneous ensemble models for predicting infection levels and mortality of COVID-19 patients: Evidence from China

Affiliations

Homogeneous ensemble models for predicting infection levels and mortality of COVID-19 patients: Evidence from China

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

LinkOut - more resources

Full Text Sources

Research Materials

Miscellaneous