Comprehensive analysis of clinical data for COVID-19 outcome estimation with machine learning models

Daniel I Morís^{1

2}, Joaquim de Moura^{1

2}, Pedro J Marcos³, Enrique Míguez Rey⁴, Jorge Novo^{1

2}, Marcos Ortega^{1

2}

Affiliations

¹ Centro de Investigación CITIC, Universidade da Coruña, Campus de Elviña, s/n, 15071 A Coruña, Spain.
² Grupo VARPA, Instituto de Investigación Biomédica de A Coruña (INIBIC), Universidade da Coruña, Xubias de Arriba, 84, 15006 A Coruña, Spain.
³ Dirección Asistencial y Servicio de Neumología, Complejo Hospitalario Universitario de A Coruña (CHUAC), Instituto de Investigación Biomédica de A Coruña (INIBIC), Universidade da Coruña, Sergas, 15006 A Coruña, Spain.
⁴ Grupo de Investigación en Virología Clínica, Sección de Enfermedades Infecciosas, Servicio de Medicina Interna, Instituto de Investigación Biomédica de A Coruña (INIBIC), Área Sanitaria A Coruña y CEE (ASCC), SERGAS, 15006 A Coruña, Spain.

PMID: 36915863
PMCID: PMC9995330
DOI: 10.1016/j.bspc.2023.104818

Comprehensive analysis of clinical data for COVID-19 outcome estimation with machine learning models

Daniel I Morís et al. Biomed Signal Process Control. 2023 Jul.

. 2023 Jul:84:104818.

doi: 10.1016/j.bspc.2023.104818. Epub 2023 Mar 9.

Authors

Daniel I Morís^{1

2}, Joaquim de Moura^{1

2}, Pedro J Marcos³, Enrique Míguez Rey⁴, Jorge Novo^{1

2}, Marcos Ortega^{1

2}

Affiliations

¹ Centro de Investigación CITIC, Universidade da Coruña, Campus de Elviña, s/n, 15071 A Coruña, Spain.
² Grupo VARPA, Instituto de Investigación Biomédica de A Coruña (INIBIC), Universidade da Coruña, Xubias de Arriba, 84, 15006 A Coruña, Spain.
³ Dirección Asistencial y Servicio de Neumología, Complejo Hospitalario Universitario de A Coruña (CHUAC), Instituto de Investigación Biomédica de A Coruña (INIBIC), Universidade da Coruña, Sergas, 15006 A Coruña, Spain.
⁴ Grupo de Investigación en Virología Clínica, Sección de Enfermedades Infecciosas, Servicio de Medicina Interna, Instituto de Investigación Biomédica de A Coruña (INIBIC), Área Sanitaria A Coruña y CEE (ASCC), SERGAS, 15006 A Coruña, Spain.

PMID: 36915863
PMCID: PMC9995330
DOI: 10.1016/j.bspc.2023.104818

Abstract

COVID-19 is a global threat for the healthcare systems due to the rapid spread of the pathogen that causes it. In such situation, the clinicians must take important decisions, in an environment where medical resources can be insufficient. In this task, the computer-aided diagnosis systems can be very useful not only in the task of supporting the clinical decisions but also to perform relevant analyses, allowing them to understand better the disease and the factors that can identify the high risk patients. For those purposes, in this work, we use several machine learning algorithms to estimate the outcome of COVID-19 patients given their clinical information. Particularly, we perform 2 different studies: the first one estimates whether the patient is at low or at high risk of death whereas the second estimates if the patient needs hospitalization or not. The results of the analyses of this work show the most relevant features for each studied scenario, as well as the classification performance of the considered machine learning models. In particular, the XGBoost algorithm is able to estimate the need for hospitalization of a patient with an AUC-ROC of $0.8415 \pm 0.0217$ while it can also estimate the risk of death with an AUC-ROC of $0.7992 \pm 0.0104$ . Results have demonstrated the great potential of the proposal to determine those patients that need a greater amount of medical resources for being at a higher risk. This provides the healthcare services with a tool to better manage their resources.

Keywords: COVID-19; Classification; Clinical data; Feature selection; Machine learning.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Figures

**Fig. 1**
Overall description of the pipeline of the proposed methodology.

**Fig. 2**
Description of the method that was used to address the problem of imbalancing that exists in the original dataset.

**Fig. 3**
Representative graphical results obtained in the experiment I. (a) Representative confusion matrix obtained with the most appropriate classifier for this experiment (XGBoost) training with all the features. (b) ROC curves of the classifiers used in this experiment, training with the whole set of features.

**Fig. 4**
Ranking of features according to the score given by each feature selection method for the experiment I (estimation of Non-Hospitalization/Hospitalization). The $x$ axes are displayed in logarithmic scale to improve the visualization of the differences among the scores. Those features with a negligible score were removed from the chart. (a) Fisher Scoring. (b) Mutual Information. (c) Variance-based Ranking.

**Fig. 5**
Evolution of the AUC-ROC values for the problem of classifying Non-Hospitalized/Hospitalized (experiment I) given the number of features using the XGBoost algorithm.

**Fig. 6**
Remarkable graphical results of the experiment II. (a) Confusion matrix representative of the most appropriate classifier for this experiment (decision tree) trained with the whole set of features. (b) ROC curves of the classifiers used in this experiment after training with all the features.

**Fig. 7**
Ranking of features according to the score given by each feature selection method for the experiment II (estimation of Survival/Death). The $x$ axes are shown in logarithmic scale to improve the visualization of the differences among the scores. Those features with a negligible score were removed from the chart. (a) Fisher Scoring. (b) Mutual Information. (c) Variance-based Ranking.

**Fig. 8**
Evolution of the AUC-ROC values for the problem of classifying Survival/Death (experiment II) given the number of features using the XGBoost algorithm.

See this image and copyright information in PMC

References

1. Yuki K., Fujiogi M., Koutsogiannaki S. COVID-19 pathophysiology: A review. Clin. Immunol. 2020;215 doi: 10.1016/j.clim.2020.108427. - DOI - PMC - PubMed
1. Siow W.T., Liew M.F., Shrestha B.R., Muchtar F., See K.C. 2020. Managing COVID-19 in resource-limited settings: critical care considerations. - DOI - PMC - PubMed
1. Gao Y.-d., Ding M., Dong X., Zhang J.-j., Kursat Azkur A., Azkur D., Gan H., Sun Y.-l., Fu W., Li W., et al. Risk factors for severe and critically ill COVID-19 patients: a review. Allergy. 2021;76(2):428–455. doi: 10.1111/all.14657. - DOI - PubMed
1. Estiri H., Strasser Z.H., Klann J.G., Naseri P., Wagholikar K.B., Murphy S.N. Predicting COVID-19 mortality with electronic medical records. NPJ Digit. Med. 2021;4(1):1–10. doi: 10.1038/s41746-021-00383-x. - DOI - PMC - PubMed
1. Yanase J., Triantaphyllou E. A systematic survey of computer-aided diagnosis in medicine: Past and present developments. Expert Syst. Appl. 2019;138 doi: 10.1016/j.eswa.2019.112821. - DOI

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Comprehensive analysis of clinical data for COVID-19 outcome estimation with machine learning models

Affiliations

Comprehensive analysis of clinical data for COVID-19 outcome estimation with machine learning models

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

LinkOut - more resources

Full Text Sources