Benchmarking of Machine Learning classifiers on plasma proteomic for COVID-19 severity prediction through interpretable artificial intelligence

doi:10.1016/j.artmed.2023.102490

Review

. 2023 Mar:137:102490.

doi: 10.1016/j.artmed.2023.102490. Epub 2023 Jan 18.

Benchmarking of Machine Learning classifiers on plasma proteomic for COVID-19 severity prediction through interpretable artificial intelligence

Stella Dimitsaki¹, George I Gavriilidis², Vlasios K Dimitriadis², Pantelis Natsiavas²

Affiliations

¹ Institute of Applied Biosciences, Centre for Research & Technology Hellas, Thermi, Thessaloniki, Greece. Electronic address: sdimitsaki@certh.gr.
² Institute of Applied Biosciences, Centre for Research & Technology Hellas, Thermi, Thessaloniki, Greece.

PMID: 36868685
PMCID: PMC9846931
DOI: 10.1016/j.artmed.2023.102490

Review

Benchmarking of Machine Learning classifiers on plasma proteomic for COVID-19 severity prediction through interpretable artificial intelligence

Stella Dimitsaki et al. Artif Intell Med. 2023 Mar.

. 2023 Mar:137:102490.

doi: 10.1016/j.artmed.2023.102490. Epub 2023 Jan 18.

Authors

Stella Dimitsaki¹, George I Gavriilidis², Vlasios K Dimitriadis², Pantelis Natsiavas²

Affiliations

¹ Institute of Applied Biosciences, Centre for Research & Technology Hellas, Thermi, Thessaloniki, Greece. Electronic address: sdimitsaki@certh.gr.
² Institute of Applied Biosciences, Centre for Research & Technology Hellas, Thermi, Thessaloniki, Greece.

PMID: 36868685
PMCID: PMC9846931
DOI: 10.1016/j.artmed.2023.102490

Abstract

The SARS-CoV-2 pandemic highlighted the need for software tools that could facilitate patient triage regarding potential disease severity or even death. In this article, an ensemble of Machine Learning (ML) algorithms is evaluated in terms of predicting the severity of their condition using plasma proteomics and clinical data as input. An overview of AI-based technical developments to support COVID-19 patient management is presented outlining the landscape of relevant technical developments. Based on this review, the use of an ensemble of ML algorithms that analyze clinical and biological data (i.e., plasma proteomics) of COVID-19 patients is designed and deployed to evaluate the potential use of AI for early COVID-19 patient triage. The proposed pipeline is evaluated using three publicly available datasets for training and testing. Three ML "tasks" are defined, and several algorithms are tested through a hyperparameter tuning method to identify the highest-performance models. As overfitting is one of the typical pitfalls for such approaches (mainly due to the size of the training/validation datasets), a variety of evaluation metrics are used to mitigate this risk. In the evaluation procedure, recall scores ranged from 0.6 to 0.74 and F1-score from 0.62 to 0.75. The best performance is observed via Multi-Layer Perceptron (MLP) and Support Vector Machines (SVM) algorithms. Additionally, input data (proteomics and clinical data) were ranked based on corresponding Shapley additive explanation (SHAP) values and evaluated for their prognosticated capacity and immuno-biological credence. This "interpretable" approach revealed that our ML models could discern critical COVID-19 cases predominantly based on patient's age and plasma proteins on B cell dysfunction, hyper-activation of inflammatory pathways like Toll-like receptors, and hypo-activation of developmental and immune pathways like SCF/c-Kit signaling. Finally, the herein computational workflow is corroborated in an independent dataset and MLP superiority along with the implication of the abovementioned predictive biological pathways are corroborated. Regarding limitations of the presented ML pipeline, the datasets used in this study contain less than 1000 observations and a significant number of input features hence constituting a high-dimensional low-sample (HDLS) dataset which could be sensitive to overfitting. An advantage of the proposed pipeline is that it combines biological data (plasma proteomics) with clinical-phenotypic data. Thus, in principle, the presented approach could enable patient triage in a timely fashion if used on already trained models. However, larger datasets and further systematic validation are needed to confirm the potential clinical value of this approach. The code is available on Github: https://github.com/inab-certh/Predicting-COVID-19-severity-through-interpretable-AI-analysis-of-plasma-proteomics.

Keywords: Artificial intelligence; COVID-19; Forecasting; Machine Learning; Severity prediction.

PubMed Disclaimer

Conflict of interest statement

Declaration of competing interest None declared.

Figures

Unlabelled Image — **Graphical abstract**

**Fig. 1**
Methodology (part A: The overall approach of the methodology, part B: The Machine Learning tasks).

**Fig. 2**
Development and evaluation of the final models A Data preprocessing B Feature contribution on the models' results.

**Fig. 3**
Task 1 performance of F1-score from GridSearch hyperparameter tuning A. MGH B. YPS C. ICL.

**Fig. 4**
Task 2 performance of F1-score from GridSearch hyperparameter tuning for trained model in MGH dataset and test in ICL and YPS dataset A. YPS B. ICL.

**Fig. 5**
Task 2 performance of F1-score from GridSearch hyperparameter tuning for trained model in YPS dataset and test in ICL and MGH dataset A. ICL B. MGH.

**Fig. 6**
Task 2 performance of F1-score from GridSearch hyperparameter tuning for trained model in ICL dataset and test in YPS and MGH dataset A. YPS B. MGH.

**Fig. 7**
Task 3 performance of F1-score from GridSearch hyperparameter tuning.

**Fig. 8**
MLP predicts COVID-19 severity using proteins regulating immune-inflammatory and developmental pathways in Task 1. GeneMania protein-protein interaction networks (PPI), pathway enrichment dotplots and heatmaps with protein-nodes for the most predictive plasma proteins (top-30) as features of MLP (Task 1) considering the MGH (A), YPS (B) and ICL (C) datasets respectively.

**Supplementary Fig. 2**
Task 1 performance from GridSearch hyperparameter tuning A. MGH dataset B. YPS dataset C. ICL dataset.

**Supplementary Fig. 3**
Task 2 performance from GridSearch hyperparameter tuning A. The model is trained in MGH dataset and test in ICL dataset B. The model is trained in MGH dataset and test in YPS dataset C. The model is trained in YPS dataset and test in ICL dataset.

**Supplementary Fig. 4**
Task 2 performance from GridSearch hyperparameter tuning A. The model is trained in YPS dataset and test in MGH dataset B. The model is trained in ICL dataset and test in YPS dataset C. The model is trained in ICL dataset and test in MGH dataset.

**Supplementary Fig. 5**
Task 3 performance from GridSearch hyperparameter tuning.

**Supplementary Fig. 6**
SHAP values of ML models that are presented the contribution of PCA components to the final classification. A. SHAP values of Task 1 MLP model trained with ICL dataset B. SHAP values of Task 2 SVM model trained with ICL dataset and tested to MGH C. SHAP values of Task 2 SVM model trained with ICL dataset and tested to YPS D. SHAP values of Task 1 MLP model trained with MGH dataset E. SHAP values of Task 2 MLP model trained with MGH dataset and tested to ICL F. SHAP values of Task 2 MLP model trained with MGH dataset and tested to YPS.

**Supplementary Fig. 7**
SHAP values of ML models that are presented the contribution of PCA components to the final classification. A. SHAP values of Task 1 MLP model trained with YPS dataset B. SHAP values of Task 2 MLP model trained with YPS dataset and tested to ICL C. SHAP values of Task 2 MLP model trained with YPS dataset and tested to MGH D. SHAP values of Task 3 SVM model.

**Supplementary Fig. 10**
MLP predicts COVID-19 severity in the independent MC dataset, using similar proteins and pathways to MGH-YPS-ICL datasets. GeneMania protein-protein interaction networks (PPI), pathway enrichment dotplots and heatmaps with protein-nodes for the most predictive plasma proteins (top-30) as features of MLP (Task 1) considering for the validation dataset MC.

**Supplementary Fig. 11**
SHAP values of ML models that are presented the contribution of PCA components to the final classification. A. SHAP values of MLP model in evaluation dataset MC.

**Supplementary Fig. 12**
Evaluation task performance from GridSearch hyperparameter tuning.

**Supplementary Fig. 13**
The proposed pipeline of our study is evaluated through the MC dataset.

See this image and copyright information in PMC

Cited by

APNet, an explainable sparse deep learning model to discover differentially active drivers of severe COVID-19.
Gavriilidis GI, Vasileiou V, Dimitsaki S, Karakatsoulis G, Giannakakis A, Pavlopoulos GA, Psomopoulos F. Gavriilidis GI, et al. Bioinformatics. 2025 Mar 4;41(3):btaf063. doi: 10.1093/bioinformatics/btaf063. Bioinformatics. 2025. PMID: 39921901 Free PMC article.
Development of a novel machine learning model based on laboratory and imaging indices to predict acute cardiac injury in cancer patients with COVID-19 infection: a retrospective observational study.
Wan G, Wu X, Zhang X, Sun H, Yu X. Wan G, et al. J Cancer Res Clin Oncol. 2023 Dec;149(19):17039-17050. doi: 10.1007/s00432-023-05417-3. Epub 2023 Sep 25. J Cancer Res Clin Oncol. 2023. PMID: 37747525 Free PMC article.
Risk Factors and Prediction of 28-Day-All Cause Mortality Among Critically Ill Patients with Acute Pancreatitis Using Machine Learning Techniques: A Retrospective Analysis of Multi-Institutions.
Cai W, Wu X, Chen Y, Chen J, Lin X. Cai W, et al. J Inflamm Res. 2024 Jul 11;17:4611-4623. doi: 10.2147/JIR.S463701. eCollection 2024. J Inflamm Res. 2024. PMID: 39011419 Free PMC article.
Predicting Outcomes of Preterm Neonates Post Intraventricular Hemorrhage.
Vignolle GA, Bauerstätter P, Schönthaler S, Nöhammer C, Olischar M, Berger A, Kasprian G, Langs G, Vierlinger K, Goeral K. Vignolle GA, et al. Int J Mol Sci. 2024 Sep 25;25(19):10304. doi: 10.3390/ijms251910304. Int J Mol Sci. 2024. PMID: 39408633 Free PMC article.
Plasma Proteins Associated with COVID-19 Severity in Puerto Rico.
Rosario-Rodríguez LJ, Cantres-Rosario YM, Carrasquillo-Carrión K, Rosa-Díaz A, Rodríguez-De Jesús AE, Rivera-Nieves V, Tosado-Rodríguez EL, Méndez LB, Roche-Lima A, Bertrán J, Meléndez LM. Rosario-Rodríguez LJ, et al. Int J Mol Sci. 2024 May 16;25(10):5426. doi: 10.3390/ijms25105426. Int J Mol Sci. 2024. PMID: 38791465 Free PMC article.

See all "Cited by" articles

References

1. World Health Organization . May 2022. COVID-19 Weekly Epidemiological Update. [Online]. Available: who.int/emergencies/diseases/novel-coronavirus-2019/situation-reports.
1. Yu X., Hartana C., Srivastava A., Fergie J. Immunity to SARS-CoV-2: lessons learned. Front. Immunol. 2019;1 doi: 10.3389/fimmu.2021.654165. www.frontiersin.org - DOI - PMC - PubMed
1. Dias-Audibert F.L., et al. Combining machine learning and metabolomics to identify weight gain biomarkers. Front Bioeng Biotechnol. Jan. 2020;8(6) doi: 10.3389/FBIOE.2020.00006/FULL. - DOI - PMC - PubMed
1. Chang C.H., Lin C.H., Lane H.Y. Machine learning and novel biomarkers for the diagnosis of Alzheimer’s disease. Int J Mol Sci. Mar. 2021;22(5):1–12. doi: 10.3390/IJMS22052761. - DOI - PMC - PubMed
1. Bauer Y., et al. Identifying early pulmonary arterial hypertension biomarkers in systemic sclerosis: machine learning on proteomics from the DETECT cohort. Eur Respir J. Jun. 2021;57(6) doi: 10.1183/13993003.02591-2020. - DOI - PMC - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Medical
- MedlinePlus Health Information
Research Materials
- NCI CPTC Antibody Characterization Program
Miscellaneous
- NCI CPTAC Assay Portal

[1] World Health Organization . May 2022. COVID-19 Weekly Epidemiological Update. [Online]. Available: who.int/emergencies/diseases/novel-coronavirus-2019/situation-reports.

[2] World Health Organization . May 2022. COVID-19 Weekly Epidemiological Update. [Online]. Available: who.int/emergencies/diseases/novel-coronavirus-2019/situation-reports.

[3] Yu X., Hartana C., Srivastava A., Fergie J. Immunity to SARS-CoV-2: lessons learned. Front. Immunol. 2019;1 doi: 10.3389/fimmu.2021.654165. www.frontiersin.org - DOI - PMC - PubMed

[4] Yu X., Hartana C., Srivastava A., Fergie J. Immunity to SARS-CoV-2: lessons learned. Front. Immunol. 2019;1 doi: 10.3389/fimmu.2021.654165. www.frontiersin.org - DOI - PMC - PubMed

[5] Dias-Audibert F.L., et al. Combining machine learning and metabolomics to identify weight gain biomarkers. Front Bioeng Biotechnol. Jan. 2020;8(6) doi: 10.3389/FBIOE.2020.00006/FULL. - DOI - PMC - PubMed

[6] Dias-Audibert F.L., et al. Combining machine learning and metabolomics to identify weight gain biomarkers. Front Bioeng Biotechnol. Jan. 2020;8(6) doi: 10.3389/FBIOE.2020.00006/FULL. - DOI - PMC - PubMed

[7] Chang C.H., Lin C.H., Lane H.Y. Machine learning and novel biomarkers for the diagnosis of Alzheimer’s disease. Int J Mol Sci. Mar. 2021;22(5):1–12. doi: 10.3390/IJMS22052761. - DOI - PMC - PubMed

[8] Chang C.H., Lin C.H., Lane H.Y. Machine learning and novel biomarkers for the diagnosis of Alzheimer’s disease. Int J Mol Sci. Mar. 2021;22(5):1–12. doi: 10.3390/IJMS22052761. - DOI - PMC - PubMed

[9] Bauer Y., et al. Identifying early pulmonary arterial hypertension biomarkers in systemic sclerosis: machine learning on proteomics from the DETECT cohort. Eur Respir J. Jun. 2021;57(6) doi: 10.1183/13993003.02591-2020. - DOI - PMC - PubMed

[10] Bauer Y., et al. Identifying early pulmonary arterial hypertension biomarkers in systemic sclerosis: machine learning on proteomics from the DETECT cohort. Eur Respir J. Jun. 2021;57(6) doi: 10.1183/13993003.02591-2020. - DOI - PMC - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Benchmarking of Machine Learning classifiers on plasma proteomic for COVID-19 severity prediction through interpretable artificial intelligence

Affiliations

Benchmarking of Machine Learning classifiers on plasma proteomic for COVID-19 severity prediction through interpretable artificial intelligence

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Medical

Research Materials

Miscellaneous