Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Jul:84:104818.
doi: 10.1016/j.bspc.2023.104818. Epub 2023 Mar 9.

Comprehensive analysis of clinical data for COVID-19 outcome estimation with machine learning models

Affiliations

Comprehensive analysis of clinical data for COVID-19 outcome estimation with machine learning models

Daniel I Morís et al. Biomed Signal Process Control. 2023 Jul.

Abstract

COVID-19 is a global threat for the healthcare systems due to the rapid spread of the pathogen that causes it. In such situation, the clinicians must take important decisions, in an environment where medical resources can be insufficient. In this task, the computer-aided diagnosis systems can be very useful not only in the task of supporting the clinical decisions but also to perform relevant analyses, allowing them to understand better the disease and the factors that can identify the high risk patients. For those purposes, in this work, we use several machine learning algorithms to estimate the outcome of COVID-19 patients given their clinical information. Particularly, we perform 2 different studies: the first one estimates whether the patient is at low or at high risk of death whereas the second estimates if the patient needs hospitalization or not. The results of the analyses of this work show the most relevant features for each studied scenario, as well as the classification performance of the considered machine learning models. In particular, the XGBoost algorithm is able to estimate the need for hospitalization of a patient with an AUC-ROC of 0 . 8415 ± 0 . 0217 while it can also estimate the risk of death with an AUC-ROC of 0 . 7992 ± 0 . 0104 . Results have demonstrated the great potential of the proposal to determine those patients that need a greater amount of medical resources for being at a higher risk. This provides the healthcare services with a tool to better manage their resources.

Keywords: COVID-19; Classification; Clinical data; Feature selection; Machine learning.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Figures

None
Graphical abstract
Fig. 1
Fig. 1
Overall description of the pipeline of the proposed methodology.
Fig. 2
Fig. 2
Description of the method that was used to address the problem of imbalancing that exists in the original dataset.
Fig. 3
Fig. 3
Representative graphical results obtained in the experiment I. (a) Representative confusion matrix obtained with the most appropriate classifier for this experiment (XGBoost) training with all the features. (b) ROC curves of the classifiers used in this experiment, training with the whole set of features.
Fig. 4
Fig. 4
Ranking of features according to the score given by each feature selection method for the experiment I (estimation of Non-Hospitalization/Hospitalization). The x axes are displayed in logarithmic scale to improve the visualization of the differences among the scores. Those features with a negligible score were removed from the chart. (a) Fisher Scoring. (b) Mutual Information. (c) Variance-based Ranking.
Fig. 5
Fig. 5
Evolution of the AUC-ROC values for the problem of classifying Non-Hospitalized/Hospitalized (experiment I) given the number of features using the XGBoost algorithm.
Fig. 6
Fig. 6
Remarkable graphical results of the experiment II. (a) Confusion matrix representative of the most appropriate classifier for this experiment (decision tree) trained with the whole set of features. (b) ROC curves of the classifiers used in this experiment after training with all the features.
Fig. 7
Fig. 7
Ranking of features according to the score given by each feature selection method for the experiment II (estimation of Survival/Death). The x axes are shown in logarithmic scale to improve the visualization of the differences among the scores. Those features with a negligible score were removed from the chart. (a) Fisher Scoring. (b) Mutual Information. (c) Variance-based Ranking.
Fig. 8
Fig. 8
Evolution of the AUC-ROC values for the problem of classifying Survival/Death (experiment II) given the number of features using the XGBoost algorithm.

References

    1. Yuki K., Fujiogi M., Koutsogiannaki S. COVID-19 pathophysiology: A review. Clin. Immunol. 2020;215 doi: 10.1016/j.clim.2020.108427. - DOI - PMC - PubMed
    1. Siow W.T., Liew M.F., Shrestha B.R., Muchtar F., See K.C. 2020. Managing COVID-19 in resource-limited settings: critical care considerations. - DOI - PMC - PubMed
    1. Gao Y.-d., Ding M., Dong X., Zhang J.-j., Kursat Azkur A., Azkur D., Gan H., Sun Y.-l., Fu W., Li W., et al. Risk factors for severe and critically ill COVID-19 patients: a review. Allergy. 2021;76(2):428–455. doi: 10.1111/all.14657. - DOI - PubMed
    1. Estiri H., Strasser Z.H., Klann J.G., Naseri P., Wagholikar K.B., Murphy S.N. Predicting COVID-19 mortality with electronic medical records. NPJ Digit. Med. 2021;4(1):1–10. doi: 10.1038/s41746-021-00383-x. - DOI - PMC - PubMed
    1. Yanase J., Triantaphyllou E. A systematic survey of computer-aided diagnosis in medicine: Past and present developments. Expert Syst. Appl. 2019;138 doi: 10.1016/j.eswa.2019.112821. - DOI

LinkOut - more resources