Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Jun 30:10:912099.
doi: 10.3389/fpubh.2022.912099. eCollection 2022.

Individual Factors Associated With COVID-19 Infection: A Machine Learning Study

Affiliations

Individual Factors Associated With COVID-19 Infection: A Machine Learning Study

Tania Ramírez-Del Real et al. Front Public Health. .

Abstract

The fast, exponential increase of COVID-19 infections and their catastrophic effects on patients' health have required the development of tools that support health systems in the quick and efficient diagnosis and prognosis of this disease. In this context, the present study aims to identify the potential factors associated with COVID-19 infections, applying machine learning techniques, particularly random forest, chi-squared, xgboost, and rpart for feature selection; ROSE and SMOTE were used as resampling methods due to the existence of class imbalance. Similarly, machine and deep learning algorithms such as support vector machines, C4.5, random forest, rpart, and deep neural networks were explored during the train/test phase to select the best prediction model. The dataset used in this study contains clinical data, anthropometric measurements, and other health parameters related to smoking habits, alcohol consumption, quality of sleep, physical activity, and health status during confinement due to the pandemic associated with COVID-19. The results showed that the XGBoost model got the best features associated with COVID-19 infection, and random forest approximated the best predictive model with a balanced accuracy of 90.41% using SMOTE as a resampling technique. The model with the best performance provides a tool to help prevent contracting SARS-CoV-2 since the variables with the highest risk factor are detected, and some of them are, to a certain extent controllable.

Keywords: COVID-19; feature selection; imbalanced data; machine learning; predictive model.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

Figure 1
Figure 1
Prediction model.
Figure 2
Figure 2
The correlation coefficient of the continuous variables of the dataset.

Similar articles

Cited by

References

    1. Organization WH . Virtual Press Conference on COVID-19-11 March 2020. Ginebra. (2020).
    1. Morawska L, Cao J. Airborne transmission of SARS-CoV-2: The world should face the reality. Environ Int. (2020) 139:105730. 10.1016/j.envint.2020.105730 - DOI - PMC - PubMed
    1. Mansour NA, Saleh AI, Badawy M, Ali HA. Accurate detection of Covid-19 patients based on Feature Correlated Naïve Bayes (FCNB) classification strategy. J Ambient Intell Humaniz Comput. (2021) 13:1–33. 10.1007/s12652-020-02883-2 - DOI - PMC - PubMed
    1. Zu ZY, Jiang MD, Xu PP, Chen W, Ni QQ, Lu GM, et al. . Coronavirus disease 2019 (COVID-19): a perspective from China. Radiology. (2020) 296:E15-E25. 10.1148/radiol.2020200490 - DOI - PMC - PubMed
    1. Shailaja K, Seetharamulu B, Jabbar M. Machine learning in healthcare: a review. In: 2018 Second International Conference on Electronics, Communication and Aerospace Technology (ICECA). Coimbatore: IEEE; (2018). p. 910–4.

Publication types