. 2022 Jun 21:10:874455.

doi: 10.3389/fpubh.2022.874455. eCollection 2022.

An Explainable AI Approach for the Rapid Diagnosis of COVID-19 Using Ensemble Learning Algorithms

Houwu Gong^{1

2}, Miye Wang^{3

4}, Hanxue Zhang¹, Md Fazla Elahe¹, Min Jin¹

Affiliations

¹ Department of Software Engineering, College of Computer Science and Electronic Engineering, Hunan University, Changsha, China.
² Academy of Military Sciences, Beijing, China.
³ Engineering Research Center of Medical Information Technology, Ministry of Education, West China Hospital, Chengdu, China.
⁴ Information Center, West China Hospital, Chengdu, China.

PMID: 35801239
PMCID: PMC9253566
DOI: 10.3389/fpubh.2022.874455

An Explainable AI Approach for the Rapid Diagnosis of COVID-19 Using Ensemble Learning Algorithms

Houwu Gong et al. Front Public Health. 2022.

. 2022 Jun 21:10:874455.

doi: 10.3389/fpubh.2022.874455. eCollection 2022.

Authors

Houwu Gong^{1

2}, Miye Wang^{3

4}, Hanxue Zhang¹, Md Fazla Elahe¹, Min Jin¹

Affiliations

¹ Department of Software Engineering, College of Computer Science and Electronic Engineering, Hunan University, Changsha, China.
² Academy of Military Sciences, Beijing, China.
³ Engineering Research Center of Medical Information Technology, Ministry of Education, West China Hospital, Chengdu, China.
⁴ Information Center, West China Hospital, Chengdu, China.

PMID: 35801239
PMCID: PMC9253566
DOI: 10.3389/fpubh.2022.874455

Abstract

Background: Artificial intelligence-based disease prediction models have a greater potential to screen COVID-19 patients than conventional methods. However, their application has been restricted because of their underlying black-box nature.

Objective: To addressed this issue, an explainable artificial intelligence (XAI) approach was developed to screen patients for COVID-19.

Methods: A retrospective study consisting of 1,737 participants (759 COVID-19 patients and 978 controls) admitted to San Raphael Hospital (OSR) from February to May 2020 was used to construct a diagnosis model. Finally, 32 key blood test indices from 1,374 participants were used for screening patients for COVID-19. Four ensemble learning algorithms were used: random forest (RF), adaptive boosting (AdaBoost), gradient boosting decision tree (GBDT), and extreme gradient boosting (XGBoost). Feature importance from the perspective of the clinical domain and visualized interpretations were illustrated by using local interpretable model-agnostic explanations (LIME) plots.

Results: The GBDT model [area under the curve (AUC): 86.4%; 95% confidence interval (CI) 0.821-0.907] outperformed the RF model (AUC: 85.7%; 95% CI 0.813-0.902), AdaBoost model (AUC: 85.4%; 95% CI 0.810-0.899), and XGBoost model (AUC: 84.9%; 95% CI 0.803-0.894) in distinguishing patients with COVID-19 from those without. The cumulative feature importance of lactate dehydrogenase, white blood cells, and eosinophil counts was 0.145, 0.130, and 0.128, respectively.

Conclusions: Ensemble machining learning (ML) approaches, mainly GBDT and LIME plots, are efficient for screening patients with COVID-19 and might serve as a potential tool in the auxiliary diagnosis of COVID-19. Patients with higher WBC count, higher LDH level, or higher EOT count, were more likely to have COVID-19.

Keywords: COVID-19; artificial intelligence; disease prediction; ensemble learning; explainable.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

**Figure 1**
Correlation coefficient matrix heatmap of all 29 variables. The obtained numerical matrix is visually displayed through a heatmap. Orange indicates a positive correlation, and green indicates a negative correlation. Color depth indicates the value of the coefficient, with deeper colors indicating stronger correlations. Specifically, redder colors indicate correlation coefficients closer to 1, and greener colors indicate coefficients closer to −1.

**Figure 2**
Receiver operating characteristic (ROC) curves for the machine learning models in screening COVID-19.

**Figure 3**
Calibration curve for the internal validation set. The calibration curve was plotted using the bucket method (continuous data discretization) to observe whether the prediction probability of the classification model is close to the empirical probability (that is, the real probability). Ideally, the calibration curve lies along the diagonal (i.e., the prediction probability is equal to the empirical probability).

**Figure 4**
Influence of input features on the outcome of the XGBoost model. The top three features are LDH, WBC, and EOT. It indicates that they have important auxiliary diagnostic significance for COVID-19. The model found that patients with higher WBC count, higher LDH level, or higher EOT count, were more likely to have COVID-19. It might assist physicians to make their decisions.

**Figure 5**
Influence of nine variables on the outcome of the XGBoost model. Because PCR ≤ 9.30 and CA >2.29 were the most significant features, the classification of this sample was confirmed as positive.

**Figure 6**
Simplified decision tree model based on the top three features.

See this image and copyright information in PMC

References

1. Nuzzo Jennifer B, Gostin Lawrence O. COVID-19 and lessons to improve preparedness for the next pandemic-reply. JAMA. (2022) 327:1823. 10.1001/jama.2022.4169 - DOI - PubMed
1. Khan M, Khan H, Khan S. Epidemiological and clinical characteristics of coronavirus disease (COVID-19) cases at a screening clinic during the early outbreak period: a single-centre study. J Med Microbiol. (2020) 69:1114–23. 10.1099/jmm.0.001231 - DOI - PMC - PubMed
1. Vogels CBF, Brito AF, Wyllie AL, Fauver JR, Ott IM, Kalinich CC. Analytical sensitivity and efficiency comparisons of SARS-CoV-2 RT–qPCR primer–probe sets. Nat Microbiol. (2020) 5:1299–305. 10.1038/s41564-020-0761-6 - DOI - PMC - PubMed
1. Rózański M, Walczak-Drzewiecka A, Witaszewska J, Wójcik E, Guziński A, Zimoń B. RT-qPCR-based tests for SARS-CoV-2 detection in pooled saliva samples for massive population screening to monitor epidemics. Sci Rep. (2022) 12:8082. 10.1038/s41598-022-12179-4 - DOI - PMC - PubMed
1. Wynants L, Van Calster B, Collins GS, Riley RD, Heinze G, Schuit E. Prediction models for diagnosis and prognosis of COVID-19: systematic review and critical appraisal. BMJ. (2020) 369:m1328. 10.1136/bmj.m1328 - DOI - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Medical
- MedlinePlus Health Information
Research Materials
- NCI CPTC Antibody Characterization Program

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

An Explainable AI Approach for the Rapid Diagnosis of COVID-19 Using Ensemble Learning Algorithms

Affiliations

An Explainable AI Approach for the Rapid Diagnosis of COVID-19 Using Ensemble Learning Algorithms

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Medical

Research Materials