. 2021 Jan 11:9:10263-10281.

doi: 10.1109/ACCESS.2021.3050852. eCollection 2021.

A Novel Bayesian Optimization-Based Machine Learning Framework for COVID-19 Detection From Inpatient Facility Data

Md Abdul Awal¹, Mehedi Masud², Md Shahadat Hossain³, Abdullah Al-Mamun Bulbul¹, S M Hasan Mahmud⁴, Anupam Kumar Bairagi⁵

Affiliations

¹ Electronics and Communication Engineering DisciplineKhulna University Khulna 9208 Bangladesh.
² Department of Computer ScienceCollege of Computers and Information TechnologyTaif University Taif 21944 Saudi Arabia.
³ Department of Quantitative SciencesInternational University of Business Agriculture and Technology Dhaka 1230 Bangladesh.
⁴ School of Computer Science and EngineeringUniversity of Electronic Science and Technology of China Chengdu 611731 China.
⁵ Computer Science and Engineering DisciplineKhulna University Khulna 9208 Bangladesh.

PMID: 34786301
PMCID: PMC8545233
DOI: 10.1109/ACCESS.2021.3050852

A Novel Bayesian Optimization-Based Machine Learning Framework for COVID-19 Detection From Inpatient Facility Data

Md Abdul Awal et al. IEEE Access. 2021.

. 2021 Jan 11:9:10263-10281.

doi: 10.1109/ACCESS.2021.3050852. eCollection 2021.

Authors

Md Abdul Awal¹, Mehedi Masud², Md Shahadat Hossain³, Abdullah Al-Mamun Bulbul¹, S M Hasan Mahmud⁴, Anupam Kumar Bairagi⁵

Affiliations

¹ Electronics and Communication Engineering DisciplineKhulna University Khulna 9208 Bangladesh.
² Department of Computer ScienceCollege of Computers and Information TechnologyTaif University Taif 21944 Saudi Arabia.
³ Department of Quantitative SciencesInternational University of Business Agriculture and Technology Dhaka 1230 Bangladesh.
⁴ School of Computer Science and EngineeringUniversity of Electronic Science and Technology of China Chengdu 611731 China.
⁵ Computer Science and Engineering DisciplineKhulna University Khulna 9208 Bangladesh.

PMID: 34786301
PMCID: PMC8545233
DOI: 10.1109/ACCESS.2021.3050852

Abstract

The whole world faces a pandemic situation due to the deadly virus, namely COVID-19. It takes considerable time to get the virus well-matured to be traced, and during this time, it may be transmitted among other people. To get rid of this unexpected situation, quick identification of COVID-19 patients is required. We have designed and optimized a machine learning-based framework using inpatient's facility data that will give a user-friendly, cost-effective, and time-efficient solution to this pandemic. The proposed framework uses Bayesian optimization to optimize the hyperparameters of the classifier and ADAptive SYNthetic (ADASYN) algorithm to balance the COVID and non-COVID classes of the dataset. Although the proposed technique has been applied to nine state-of-the-art classifiers to show the efficacy, it can be used to many classifiers and classification problems. It is evident from this study that eXtreme Gradient Boosting (XGB) provides the highest Kappa index of 97.00%. Compared to without ADASYN, our proposed approach yields an improvement in the kappa index of 96.94%. Besides, Bayesian optimization has been compared to grid search, random search to show efficiency. Furthermore, the most dominating features have been identified using SHapely Adaptive exPlanations (SHAP) analysis. A comparison has also been made among other related works. The proposed method is capable enough of tracing COVID patients spending less time than that of the conventional techniques. Finally, two potential applications, namely, clinically operable decision tree and decision support system, have been demonstrated to support clinical staff and build a recommender system.

Keywords: ADASYN; Bayesian optimization; COVID-19; classification; inpatient's facility data.

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.

PubMed Disclaimer

Figures

**FIGURE 1.**
Characteristics of the Sample.

**FIGURE 2.**
Fill rate for all Variables.

**FIGURE 3.**
The overall workflow of the classification of COVID-19. The first phase is collecting raw data followed by pre-processing, where the raw data is imputed, scaled, and most importantly, the imbalanced data is balanced using ADASYN algorithm. Secondly, the pre-processed data are split into the train and test set used by different classifiers to measure the classification performance. In the next step, Bayesian optimization has been implemented to tune the hyperparameters of the classifiers. This optimized classification model is then tested, and different performance metrics (accuracy, precision, Confusion matrix, ROC, 10-fold cross-validation, ANOVA, and multi-comparison test) have been used for evaluation. Finally, the important features have been efficiently traced using SHAP analysis.

**FIGURE 5.**
ROC curve without ADASYN. Note that the optimized model has not been created by using a balanced dataset.

**FIGURE 6.**
ROC Curve for COVID on original test data only using each model. The optimized model has been created by using a balanced dataset and then applied to the original test dataset.

**FIGURE 7.**
Confusion matrix of the balanced model applied in (a) COVID test Dataset with ADASYN, (b) original COVID test Dataset only. Figure 7(a) depicts the percentage of the correct classification in with the first two diagonal cells generated by the trained network. The numbers of patients who are correctly classified as a COVID and non-COVID were 3150 and 3233, corresponding to 48.7% and 49.9% in each group’s patients, respectively. Likewise, the numbers of patients who are incorrectly classified as a COVID and non-COVID were 24 and 67, with 0.4% and 1.0% correspondingly among all patients in each group. Similarly, the overall 99.2% were correctly, and 0.8% were incorrectly classified COVID, and non-COVID were overall, 98.0% and 2.0% correctly and incorrectly classified accordingly. In the case of prediction, the correct overall predictions for COVID and non-COVID were 97.9% and 99.3%, respectively. On the other hand, the incorrect results for COVID and non-COVID were 2.1% and 0.7%. Similarly, we can also interpret Figure 7(b).

**FIGURE 8.**
Box-plot for (a) COVID Dataset and (b) multi-comparison test. Note that (b) is a graphical user interface tool by which one can test the statistical significance of any classifiers. Here we only show the effect of XGB. The effect of other classifiers can also be interpreted in the same way.

**FIGURE 9.**
Recall rate vs. decision boundary curve for (a) COVID positive and (b) COVID negative.

**FIGURE 10.**
Bootstrap ROC curve of the COVID dataset using XGB with 95% CI.

**FIGURE 11.**
Feature importance plot using SHAP for XGB.

**FIGURE 12.**
The SHAP variable importance plot of training data using XGB.

**FIGURE 13.**
Comparative optimization techniques applied to the XGB model.

**FIGURE 14.**
Box-plot of Bayesian optimization and Harris Hawks optimization.

**FIGURE 15.**
A decision rule using four key features and their thresholds in absolute value.

**FIGURE 16.**
Probabilistic output for the DSS. In the upper figure, the 0 has represented a subject with COVID negative, whereas 1 represented a subject with COVID positive. The lower figure represents a probabilistic outcome of the subject affected by COVID, where the red dotted line defines the threshold level. When the patient data level exceeds this threshold level, then the subject will be considered as COVID positive. Whereas the subject with the probability of less than 0.5, i.e., the threshold value, will be regarded as COVID negative. In either way, we can say that this the chance that a person is affected by COVID.

See this image and copyright information in PMC

Cited by

An Improved Machine-Learning Approach for COVID-19 Prediction Using Harris Hawks Optimization and Feature Analysis Using SHAP.
Debjit K, Islam MS, Rahman MA, Pinki FT, Nath RD, Al-Ahmadi S, Hossain MS, Mumenin KM, Awal MA. Debjit K, et al. Diagnostics (Basel). 2022 Apr 19;12(5):1023. doi: 10.3390/diagnostics12051023. Diagnostics (Basel). 2022. PMID: 35626179 Free PMC article.
Detection of COVID-19 Patients Using Machine Learning Techniques: A Nationwide Chilean Study.
Ormeño P, Márquez G, Guerrero-Nancuante C, Taramasco C. Ormeño P, et al. Int J Environ Res Public Health. 2022 Jun 30;19(13):8058. doi: 10.3390/ijerph19138058. Int J Environ Res Public Health. 2022. PMID: 35805713 Free PMC article.
Early-Stage Detection of Ovarian Cancer Based on Clinical Data Using Machine Learning Approaches.
Ahamad MM, Aktar S, Uddin MJ, Rahman T, Alyami SA, Al-Ashhab S, Akhdar HF, Azad A, Moni MA. Ahamad MM, et al. J Pers Med. 2022 Jul 25;12(8):1211. doi: 10.3390/jpm12081211. J Pers Med. 2022. PMID: 35893305 Free PMC article.
Early Prediction of Diabetes Using an Ensemble of Machine Learning Models.
Dutta A, Hasan MK, Ahmad M, Awal MA, Islam MA, Masud M, Meshref H. Dutta A, et al. Int J Environ Res Public Health. 2022 Sep 28;19(19):12378. doi: 10.3390/ijerph191912378. Int J Environ Res Public Health. 2022. PMID: 36231678 Free PMC article.
A Modified Aquila-Based Optimized XGBoost Framework for Detecting Probable Seizure Status in Neonates.
Mumenin KM, Biswas P, Khan MA, Alammary AS, Nahid AA. Mumenin KM, et al. Sensors (Basel). 2023 Aug 9;23(16):7037. doi: 10.3390/s23167037. Sensors (Basel). 2023. PMID: 37631573 Free PMC article.

See all "Cited by" articles

References

1. Mental Health and Psychosocial Considerations During the COVID-19 Outbreak, World Health Org., Geneva, Switzerland, 2020. Accessed: Mar. 18, 2020.
1. Coronavirus Disease 2019 (COVID-19): Situation Report 88, World Health Org., Geneva, Switzerland, 2020.
1. Jebril N. M. T., “World Health Organization declared a pandemic public health menace: A systematic review of the coronavirus disease 2019‘COVID-19,”’ Int. J. Psychosocial Rehabil., vol. 24, no. 9, pp. 2784–2795, May 2020.
1. WHO. Coronavirus Disease (COVID-19) Dashboard. Accessed: Aug. 4, 2020. [Online]. Available: https://covid19.who.int/
1. Van Doremalen N., Bushmaker T., Morris D. H., Holbrook M. G., Gamble A., Williamson B. N., Tamin A., Harcourt J. L., Thornburg N. J., Gerber S. I., and Lloyd-Smith J. O., “Aerosol and surface stability of SARS-CoV-2 as compared with SARS-CoV-1,” New England J. Med., vol. 382, no. 16, pp. 1564–1567, 2020. - PMC - PubMed

LinkOut - more resources

Full Text Sources
- Europe PubMed Central
- PubMed Central

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

A Novel Bayesian Optimization-Based Machine Learning Framework for COVID-19 Detection From Inpatient Facility Data

Affiliations

A Novel Bayesian Optimization-Based Machine Learning Framework for COVID-19 Detection From Inpatient Facility Data

Authors

Affiliations

Abstract

Figures

Similar articles

Cited by

References

LinkOut - more resources

Full Text Sources