Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 May 12:9:626697.
doi: 10.3389/fpubh.2021.626697. eCollection 2021.

Machine Learning Based Clinical Decision Support System for Early COVID-19 Mortality Prediction

Affiliations

Machine Learning Based Clinical Decision Support System for Early COVID-19 Mortality Prediction

Akshaya Karthikeyan et al. Front Public Health. .

Abstract

The coronavirus disease 2019 (COVID-19), caused by the virus SARS-CoV-2, is an acute respiratory disease that has been classified as a pandemic by the World Health Organization (WHO). The sudden spike in the number of infections and high mortality rates have put immense pressure on the public healthcare systems. Hence, it is crucial to identify the key factors for mortality prediction to optimize patient treatment strategy. Different routine blood test results are widely available compared to other forms of data like X-rays, CT-scans, and ultrasounds for mortality prediction. This study proposes machine learning (ML) methods based on blood tests data to predict COVID-19 mortality risk. A powerful combination of five features: neutrophils, lymphocytes, lactate dehydrogenase (LDH), high-sensitivity C-reactive protein (hs-CRP), and age helps to predict mortality with 96% accuracy. Various ML models (neural networks, logistic regression, XGBoost, random forests, SVM, and decision trees) have been trained and performance compared to determine the model that achieves consistently high accuracy across the days that span the disease. The best performing method using XGBoost feature importance and neural network classification, predicts with an accuracy of 90% as early as 16 days before the outcome. Robust testing with three cases based on days to outcome confirms the strong predictive performance and practicality of the proposed model. A detailed analysis and identification of trends was performed using these key biomarkers to provide useful insights for intuitive application. This study provide solutions that would help accelerate the decision-making process in healthcare systems for focused medical treatments in an accurate, early, and reliable manner.

Keywords: biomarkers; coronavirus disease 2019; machine learning; mortality; prognosis.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

Figure 1
Figure 1
Distribution of the two classes in the train and test sets after splitting.
Figure 2
Figure 2
Flowchart depicting the model development pipeline used in this study.
Figure 3
Figure 3
Architecture of the neural network implemented for feature selection, where n represents the number of features to be analyzed.
Figure 4
Figure 4
Comparison of the performance of different machine learning algorithms assessed using different metrics. The vertical lines denote the standard deviations. (A) Accuracy. (B) AUC and F1 score.
Figure 5
Figure 5
The performance of neural net on the test data using case 1: Number of days to outcome less than or equal to n. (A) The class-wise distribution of the cumulated data-points (≤ nth day) for all samples in the imputed test set. (B) Accuracy of the model evaluated for different days to outcome. (C) F1-score and AUC of the model evaluated for different days to outcome.
Figure 6
Figure 6
The performance of neural net on the test data using case 2: Number of days to outcome greater than or equal to n. (A) The class-wise distribution of the cumulated data-points (≤ nth day) for all samples in the imputed test set. (B) Accuracy of the model evaluated for different days to outcome. (C) F1-score and AUC of the model evaluated for different days to outcome.
Figure 7
Figure 7
The performance of neural net on the test data using case 3: Number of days to outcome equal to n. (A) The class-wise distribution of the cumulated data-points (≤ nth day) for all samples in the imputed test set. (B) Accuracy of the model evaluated for different days to outcome. (C) F1-score and AUC of the model evaluated for different days to outcome.
Figure 8
Figure 8
Box and whisker plot showing the variations of four selected features with respect to the days to outcome. (A) hs-CRP, (B) neutrophils (%), (C) lymphocyte (%), (D) lactate dehydrogenase.

Similar articles

Cited by

References

    1. Zu ZY, Di Jiang M, Xu PP, Chen W, Ni QQ, Lu GM, et al. . Coronavirus disease 2019 (covid-19): a perspective from china. Radiology. (2020) 2020:200490. 10.1148/radiol.2020200490 - DOI - PMC - PubMed
    1. Menni C, Valdes AM, Freidin MB, Sudre CH, Nguyen LH, Frew DA, et al. . Real-time tracking of self-reported symptoms to predict potential COVID-19. Nat Med. (2020) 26:1037–40. 10.1038/s41591-020-0916-2 - DOI - PMC - PubMed
    1. Callejon-Leblic MA, Moreno-Luna R, Del Cuvillo A, Reyes-Tejero IM, Garcia-Villaran MA, Santos-Peña M, et al. . Loss of smell and taste can accurately predict COVID-19 infection: a machine-learning approach. J Clin Med. (2021). 10:570. 10.3390/jcm10040570 - DOI - PMC - PubMed
    1. Liu Y, Mao B, Liang S, Yang JW, Lu HW, Chai YH, et al. . Association between age and clinical characteristics and outcomes of COVID-19. Eur Respir J. (2020) 55:2001112. 10.1183/13993003.01112-2020 - DOI - PMC - PubMed
    1. Pan A, Liu L, Wang C, Guo H, Hao X, Wang Q, et al. . Association of public health interventions with the epidemiology of the COVID-19 outbreak in Wuhan, China. JAMA. (2020) 323:1915–23. 10.1001/jama.2020.6130 - DOI - PMC - PubMed

Publication types