Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020;1(4):206.
doi: 10.1007/s42979-020-00216-w. Epub 2020 Jun 21.

Predictive Data Mining Models for Novel Coronavirus (COVID-19) Infected Patients' Recovery

Affiliations

Predictive Data Mining Models for Novel Coronavirus (COVID-19) Infected Patients' Recovery

L J Muhammad et al. SN Comput Sci. 2020.

Abstract

Novel coronavirus (COVID-19 or 2019-nCoV) pandemic has neither clinically proven vaccine nor drugs; however, its patients are recovering with the aid of antibiotic medications, anti-viral drugs, and chloroquine as well as vitamin C supplementation. It is now evident that the world needs a speedy and quicker solution to contain and tackle the further spread of COVID-19 across the world with the aid of non-clinical approaches such as data mining approaches, augmented intelligence and other artificial intelligence techniques so as to mitigate the huge burden on the healthcare system while providing the best possible means for patients' diagnosis and prognosis of the 2019-nCoV pandemic effectively. In this study, data mining models were developed for the prediction of COVID-19 infected patients' recovery using epidemiological dataset of COVID-19 patients of South Korea. The decision tree, support vector machine, naive Bayes, logistic regression, random forest, and K-nearest neighbor algorithms were applied directly on the dataset using python programming language to develop the models. The model predicted a minimum and maximum number of days for COVID-19 patients to recover from the virus, the age group of patients who are of high risk not to recover from the COVID-19 pandemic, those who are likely to recover and those who might be likely to recover quickly from COVID-19 pandemic. The results of the present study have shown that the model developed with decision tree data mining algorithm is more efficient to predict the possibility of recovery of the infected patients from COVID-19 pandemic with the overall accuracy of 99.85% which stands to be the best model developed among the models developed with other algorithms including support vector machine, naive Bayes, logistic regression, random forest, and K-nearest neighbor.

Keywords: COVID-19; Coronavirus; Data mining; Decision tree; Pandemic; Patients’ recovery.

PubMed Disclaimer

Conflict of interest statement

Conflict of interestAuthors have declared that no conflict of interest exists.

Figures

Fig. 1
Fig. 1
Frequency of sex attribute
Fig. 2
Fig. 2
Frequency of age attribute
Fig. 3
Fig. 3
Frequency of infection_case attribute
Fig. 4
Fig. 4
Frequency of no_days attribute
Fig. 5
Fig. 5
Frequency of state attribute
Fig. 6
Fig. 6
Decision Tree model for COVID-19 infectedpatients’ recovery
Fig. 7
Fig. 7
Performance evaluation results of the models

References

    1. Al-Turaiki I, Alshahrani M, Almutairi T. Building predictive models for MERS-CoV infections using data mining techniques. J Infect Public Health. 2016;9:744–748. doi: 10.1016/j.jiph.2016.09.007. - DOI - PMC - PubMed
    1. Altman NS. An introduction to kernel and nearest-neighbor nonparametric regression (PDF) Am Stat. 1992;46(3):175–185.
    1. Coronavirus dataset of Korea Centers for Disease Control & Prevention (KCDC). https://www.kaggle.com/kimjihoo/coronavirusdataset/data. Accessed 20 Apr 2020
    1. Everitt BS, et al. Miscellaneous clustering methods in cluster analysis. 5. Chichester: Wiley; 2011.
    1. Gandhi R. Naive Bayes classifier, towards data science. 2018. https://towardsdatascience.com/naive-bayes-classifier-81d512f50a7c. Accessed 25 Apr 2020.

LinkOut - more resources