Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Jun 21:10:874455.
doi: 10.3389/fpubh.2022.874455. eCollection 2022.

An Explainable AI Approach for the Rapid Diagnosis of COVID-19 Using Ensemble Learning Algorithms

Affiliations

An Explainable AI Approach for the Rapid Diagnosis of COVID-19 Using Ensemble Learning Algorithms

Houwu Gong et al. Front Public Health. .

Abstract

Background: Artificial intelligence-based disease prediction models have a greater potential to screen COVID-19 patients than conventional methods. However, their application has been restricted because of their underlying black-box nature.

Objective: To addressed this issue, an explainable artificial intelligence (XAI) approach was developed to screen patients for COVID-19.

Methods: A retrospective study consisting of 1,737 participants (759 COVID-19 patients and 978 controls) admitted to San Raphael Hospital (OSR) from February to May 2020 was used to construct a diagnosis model. Finally, 32 key blood test indices from 1,374 participants were used for screening patients for COVID-19. Four ensemble learning algorithms were used: random forest (RF), adaptive boosting (AdaBoost), gradient boosting decision tree (GBDT), and extreme gradient boosting (XGBoost). Feature importance from the perspective of the clinical domain and visualized interpretations were illustrated by using local interpretable model-agnostic explanations (LIME) plots.

Results: The GBDT model [area under the curve (AUC): 86.4%; 95% confidence interval (CI) 0.821-0.907] outperformed the RF model (AUC: 85.7%; 95% CI 0.813-0.902), AdaBoost model (AUC: 85.4%; 95% CI 0.810-0.899), and XGBoost model (AUC: 84.9%; 95% CI 0.803-0.894) in distinguishing patients with COVID-19 from those without. The cumulative feature importance of lactate dehydrogenase, white blood cells, and eosinophil counts was 0.145, 0.130, and 0.128, respectively.

Conclusions: Ensemble machining learning (ML) approaches, mainly GBDT and LIME plots, are efficient for screening patients with COVID-19 and might serve as a potential tool in the auxiliary diagnosis of COVID-19. Patients with higher WBC count, higher LDH level, or higher EOT count, were more likely to have COVID-19.

Keywords: COVID-19; artificial intelligence; disease prediction; ensemble learning; explainable.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

Figure 1
Figure 1
Correlation coefficient matrix heatmap of all 29 variables. The obtained numerical matrix is visually displayed through a heatmap. Orange indicates a positive correlation, and green indicates a negative correlation. Color depth indicates the value of the coefficient, with deeper colors indicating stronger correlations. Specifically, redder colors indicate correlation coefficients closer to 1, and greener colors indicate coefficients closer to −1.
Figure 2
Figure 2
Receiver operating characteristic (ROC) curves for the machine learning models in screening COVID-19.
Figure 3
Figure 3
Calibration curve for the internal validation set. The calibration curve was plotted using the bucket method (continuous data discretization) to observe whether the prediction probability of the classification model is close to the empirical probability (that is, the real probability). Ideally, the calibration curve lies along the diagonal (i.e., the prediction probability is equal to the empirical probability).
Figure 4
Figure 4
Influence of input features on the outcome of the XGBoost model. The top three features are LDH, WBC, and EOT. It indicates that they have important auxiliary diagnostic significance for COVID-19. The model found that patients with higher WBC count, higher LDH level, or higher EOT count, were more likely to have COVID-19. It might assist physicians to make their decisions.
Figure 5
Figure 5
Influence of nine variables on the outcome of the XGBoost model. Because PCR ≤ 9.30 and CA >2.29 were the most significant features, the classification of this sample was confirmed as positive.
Figure 6
Figure 6
Simplified decision tree model based on the top three features.

Similar articles

Cited by

References

    1. Nuzzo Jennifer B, Gostin Lawrence O. COVID-19 and lessons to improve preparedness for the next pandemic-reply. JAMA. (2022) 327:1823. 10.1001/jama.2022.4169 - DOI - PubMed
    1. Khan M, Khan H, Khan S. Epidemiological and clinical characteristics of coronavirus disease (COVID-19) cases at a screening clinic during the early outbreak period: a single-centre study. J Med Microbiol. (2020) 69:1114–23. 10.1099/jmm.0.001231 - DOI - PMC - PubMed
    1. Vogels CBF, Brito AF, Wyllie AL, Fauver JR, Ott IM, Kalinich CC. Analytical sensitivity and efficiency comparisons of SARS-CoV-2 RT–qPCR primer–probe sets. Nat Microbiol. (2020) 5:1299–305. 10.1038/s41564-020-0761-6 - DOI - PMC - PubMed
    1. Rózański M, Walczak-Drzewiecka A, Witaszewska J, Wójcik E, Guziński A, Zimoń B. RT-qPCR-based tests for SARS-CoV-2 detection in pooled saliva samples for massive population screening to monitor epidemics. Sci Rep. (2022) 12:8082. 10.1038/s41598-022-12179-4 - DOI - PMC - PubMed
    1. Wynants L, Van Calster B, Collins GS, Riley RD, Heinze G, Schuit E. Prediction models for diagnosis and prognosis of COVID-19: systematic review and critical appraisal. BMJ. (2020) 369:m1328. 10.1136/bmj.m1328 - DOI - PMC - PubMed

Publication types