Development and Validation of a Robust and Interpretable Early Triaging Support System for Patients Hospitalized With COVID-19: Predictive Algorithm Modeling and Interpretation Study

doi:10.2196/52134

Multicenter Study

. 2024 Jan 11:26:e52134.

doi: 10.2196/52134.

Development and Validation of a Robust and Interpretable Early Triaging Support System for Patients Hospitalized With COVID-19: Predictive Algorithm Modeling and Interpretation Study

Sangwon Baek^{1

2}, Yeon Joo Jeong³, Yun-Hyeon Kim⁴, Jin Young Kim⁵, Jin Hwan Kim⁶, Eun Young Kim⁷, Jae-Kwang Lim⁸, Jungok Kim⁹, Zero Kim^{1

10}, Kyunga Kim^#^{10

11

12}, Myung Jin Chung^#^{1

10

12

13}

Affiliations

¹ Medical AI Research Center, Samsung Medical Center, Seoul, Republic of Korea.
² Center for Data Science, New York University, New York, NY, United States.
³ Department of Radiology, Research Institute for Convergence of Biomedical Science and Technology, Pusan National University Yangsan Hospital, Yangsan, Republic of Korea.
⁴ Department of Radiology, Chonnam National University Hospital, Gwangju, Republic of Korea.
⁵ Department of Radiology, Keimyung University Dongsan Hospital, Daegu, Republic of Korea.
⁶ Department of Radiology, Chungnam National University Hospital, Daejeon, Republic of Korea.
⁷ Department of Radiology, Gachon University Gil Medical Center, Incheon, Republic of Korea.
⁸ Department of Radiology, School of Medicine, Kyungpook National University, Daegu, Republic of Korea.
⁹ Department of Infectious Diseases, Chungnam National University Sejong Hospital, Sejong, Republic of Korea.
¹⁰ Department of Data Convergence & Future Medicine, Sungkyunkwan University School of Medicine, Seoul, Republic of Korea.
¹¹ Biomedical Statistics Center, Research Institute for Future Medicine, Samsung Medical Center, Seoul, Republic of Korea.
¹² Department of Digital Health, SAIHST, Sungkyunkwan University, Seoul, Republic of Korea.
¹³ Department of Radiology, Samsung Medical Center, Seoul, Republic of Korea.

^# Contributed equally.

PMID: 38206673
PMCID: PMC10811577
DOI: 10.2196/52134

Multicenter Study

Development and Validation of a Robust and Interpretable Early Triaging Support System for Patients Hospitalized With COVID-19: Predictive Algorithm Modeling and Interpretation Study

Sangwon Baek et al. J Med Internet Res. 2024.

. 2024 Jan 11:26:e52134.

doi: 10.2196/52134.

Authors

Affiliations

¹ Medical AI Research Center, Samsung Medical Center, Seoul, Republic of Korea.
² Center for Data Science, New York University, New York, NY, United States.
³ Department of Radiology, Research Institute for Convergence of Biomedical Science and Technology, Pusan National University Yangsan Hospital, Yangsan, Republic of Korea.
⁴ Department of Radiology, Chonnam National University Hospital, Gwangju, Republic of Korea.
⁵ Department of Radiology, Keimyung University Dongsan Hospital, Daegu, Republic of Korea.
⁶ Department of Radiology, Chungnam National University Hospital, Daejeon, Republic of Korea.
⁷ Department of Radiology, Gachon University Gil Medical Center, Incheon, Republic of Korea.
⁸ Department of Radiology, School of Medicine, Kyungpook National University, Daegu, Republic of Korea.
⁹ Department of Infectious Diseases, Chungnam National University Sejong Hospital, Sejong, Republic of Korea.
¹⁰ Department of Data Convergence & Future Medicine, Sungkyunkwan University School of Medicine, Seoul, Republic of Korea.
¹¹ Biomedical Statistics Center, Research Institute for Future Medicine, Samsung Medical Center, Seoul, Republic of Korea.
¹² Department of Digital Health, SAIHST, Sungkyunkwan University, Seoul, Republic of Korea.
¹³ Department of Radiology, Samsung Medical Center, Seoul, Republic of Korea.

^# Contributed equally.

PMID: 38206673
PMCID: PMC10811577
DOI: 10.2196/52134

Abstract

Background: Robust and accurate prediction of severity for patients with COVID-19 is crucial for patient triaging decisions. Many proposed models were prone to either high bias risk or low-to-moderate discrimination. Some also suffered from a lack of clinical interpretability and were developed based on early pandemic period data. Hence, there has been a compelling need for advancements in prediction models for better clinical applicability.

Objective: The primary objective of this study was to develop and validate a machine learning-based Robust and Interpretable Early Triaging Support (RIETS) system that predicts severity progression (involving any of the following events: intensive care unit admission, in-hospital death, mechanical ventilation required, or extracorporeal membrane oxygenation required) within 15 days upon hospitalization based on routinely available clinical and laboratory biomarkers.

Methods: We included data from 5945 hospitalized patients with COVID-19 from 19 hospitals in South Korea collected between January 2020 and August 2022. For model development and external validation, the whole data set was partitioned into 2 independent cohorts by stratified random cluster sampling according to hospital type (general and tertiary care) and geographical location (metropolitan and nonmetropolitan). Machine learning models were trained and internally validated through a cross-validation technique on the development cohort. They were externally validated using a bootstrapped sampling technique on the external validation cohort. The best-performing model was selected primarily based on the area under the receiver operating characteristic curve (AUROC), and its robustness was evaluated using bias risk assessment. For model interpretability, we used Shapley and patient clustering methods.

Results: Our final model, RIETS, was developed based on a deep neural network of 11 clinical and laboratory biomarkers that are readily available within the first day of hospitalization. The features predictive of severity included lactate dehydrogenase, age, absolute lymphocyte count, dyspnea, respiratory rate, diabetes mellitus, c-reactive protein, absolute neutrophil count, platelet count, white blood cell count, and saturation of peripheral oxygen. RIETS demonstrated excellent discrimination (AUROC=0.937; 95% CI 0.935-0.938) with high calibration (integrated calibration index=0.041), satisfied all the criteria of low bias risk in a risk assessment tool, and provided detailed interpretations of model parameters and patient clusters. In addition, RIETS showed potential for transportability across variant periods with its sustainable prediction on Omicron cases (AUROC=0.903, 95% CI 0.897-0.910).

Conclusions: RIETS was developed and validated to assist early triaging by promptly predicting the severity of hospitalized patients with COVID-19. Its high performance with low bias risk ensures considerably reliable prediction. The use of a nationwide multicenter cohort in the model development and validation implicates generalizability. The use of routinely collected features may enable wide adaptability. Interpretations of model parameters and patients can promote clinical applicability. Together, we anticipate that RIETS will facilitate the patient triaging workflow and efficient resource allocation when incorporated into a routine clinical practice.

Keywords: COVID-19; Omicron; SARS-CoV-2; SHAP; Shapley; biomarker; biomarkers; clustering; coronavirus; deep learning; early triaging; emergency; hospital admission; hospital admissions; hospitalization; hospitalizations; hospitalize; interpretability; machine learning; neural network; neural networks; predict; prediction; prediction model; predictive; prognosis; prognostic; prognostics; severity; triage; triaging.

©Sangwon Baek, Yeon joo Jeong, Yun-Hyeon Kim, Jin Young Kim, Jin Hwan Kim, Eun Young Kim, Jae-Kwang Lim, Jungok Kim, Zero Kim, Kyunga Kim, Myung Jin Chung. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 11.01.2024.

PubMed Disclaimer

Conflict of interest statement

Conflicts of Interest: None declared.

Figures

**Figure 1**
Patient flowchart depicting the generation of development and validation cohorts among hospitalized patients with COVID-19 (n=5945). Stratified random cluster sampling was applied to segment the cohorts based on hospital type (general vs tertiary) and geographical location (metropolitan vs nonmetropolitan). RT-PCR: real time polymerase chain reaction.

**Figure 2**
Machine learning–based pipeline for developing and validating the prognosis prediction model for COVID-19 severity. AUROC: area under receiver operating characteristic curve; DCA: decision curve analysis; DDRTree: Discriminative dimensionality reduction by learning a tree; DNN: deep neural network; GBM: gradient boosting machine; MLR: multivariable logistic regression; RF: random forest; RF-MDIFI: random forest–based mean decrease in Gini Impurity feature importance method; RF-PFI: random forest–based permutation feature importance method; RF-Shapley: random forest–based Shapley method; ROC: receiver operating characteristic curve; SHAP: Shapley additive explanations; SVM: support vector machine; XGB: extreme gradient boosting; XGB-BFI: extreme gradient boosting–based built-in feature importance method; XGB-PFI: extreme gradient boosting–based permutation feature importance method; XGB-Shapley: extreme gradient boosting–based Shapley method.

**Figure 3**
Patient clustering based on features in RIETS and characterization with dyspnea, DM, age, and severity. (A) DDRTree plot for severity probability. (B) DDRTree plot for patients with dyspnea, DM, and age ≥60 years. (C) DDRTree plot for patients with dyspnea, DM, and age <60 years. Points closer to dark red indicate a high severity probability, while points closer to light green indicate a low severity probability. DDRTree: discriminative dimensionality reduction via learning a tree; DM: diabetes mellitus; LLG: lower left group; LRG: lower right group; MRG: middle right group; RIETS: robust and interpretable early triaging support system; URG: upper right group.

**Figure 4**
Patient clustering based on features in RIETS and characterization with vital signs and laboratory results. A dark blue color indicates a high concentration and a light green color indicates a low concentration of each corresponding feature. ALC: absolute lymphocyte count; ANC: absolute neutrophil count; CRP: c-reactive protein; DDRTree: discriminative dimensionality reduction via learning a tree; LDH: lactate dehydrogenase; PLT: platelet count; RIETS: Robust and Interpretable Early Triaging Support; RR: respiratory rate; SPO2: saturation of peripheral oxygen; WBC: white blood cell.

**Figure 5**
Performance comparisons of RIETS to other prediction models in the external validation. (A) Receiver operating characteristic curves for displaying the discriminative performances. (B) Calibration plots for showing the practical reliability of risk prediction. (C) Decision curve analysis plots for demonstrating the net clinical utility when deployed in a clinical practice. All dashed lines represent the references. Shaded areas represent the 95% CI bands. AUROC: area under receiver operating characteristic curve; GBM: gradient boosting machine; ICI: integrated calibration index; MLR: multivariable logistic regression; RIETS: Robust and Interpretable Early Triaging System; RF: random forest; SVM: support vector machine; XGB: extreme gradient boosting.

**Figure 6**
Average impact of 11 selected features in RIETS on COVID-19 severity prediction. To compute SHAP values based on the RIETS model, we properly modified KernelExplainer. Specific SHAP values are shown on the right of each feature name in the y-axis labels. ALC: absolute lymphocyte count; ANC: absolute neutrophil count; CRP: c-reactive protein; DM: diabetes mellitus; LDH: lactate dehydrogenase; PLT: platelet count; RIETS: Robust and Interpretable Early Triaging System; RR: respiratory rate; SHAP: Shapley additive explanations; SPO2: saturation of peripheral oxygen; WBC: white blood cell.

See this image and copyright information in PMC

Cited by

Machine Learning for the Prediction of Acute Kidney Injury in Critically Ill Patients With Coronary Heart Disease: Algorithm Development and Validation.
Li Y, Xiao M, Li Y, Lv L, Zhang S, Liu Y, Zhang J. Li Y, et al. JMIR Med Inform. 2025 May 28;13:e72349. doi: 10.2196/72349. JMIR Med Inform. 2025. PMID: 40383933 Free PMC article.
Deep Learning-based Time-to-event Analysis of Depression and Asthma using the All of Us Research Program.
Wang X, Ohno-Machado L, Gomez JL, Gu W, Sun R, Kim J. Wang X, et al. AMIA Annu Symp Proc. 2025 May 22;2024:1186-1195. eCollection 2024. AMIA Annu Symp Proc. 2025. PMID: 40417537 Free PMC article.
Machine Learning and Artificial Intelligence for Infectious Disease Surveillance, Diagnosis, and Prognosis.
Cheah BCJ, Vicente CR, Chan KR. Cheah BCJ, et al. Viruses. 2025 Jun 23;17(7):882. doi: 10.3390/v17070882. Viruses. 2025. PMID: 40733500 Free PMC article. Review.
Predicting COVID-19 severity in pediatric patients using machine learning: a comparative analysis of algorithms and ensemble methods.
Pourakbari B, Mamishi S, Valian SK, Mahmoudi S, Sadeghi RH, Abdolsalehi MR, Khodabandeh M, Farahmand M. Pourakbari B, et al. Sci Rep. 2025 Aug 8;15(1):29118. doi: 10.1038/s41598-025-15366-1. Sci Rep. 2025. PMID: 40781476 Free PMC article.

References

1. WHO COVID-19 dashboard. World Health Organization. [2024-01-02]. https://data.who.int/dashboards/covid19/cases?n=c .
1. Gilbert A, Ghuysen A. Triage in the time of COVID-19. Lancet Digit Health. 2022 Apr;4(4):e210–e211. doi: 10.1016/s2589-7500(22)00001-2. - DOI - PMC - PubMed
1. Aleem A, Samad ABA, Vaqar S. StatPearls [Internet] Treasure Island, FL: StatPearls Publishing; 2023. Emerging variants of SARS-CoV-2 and novel therapeutics against coronavirus (COVID-19) - PubMed
1. Lenharo M. WHO declares end to COVID-19's emergency phase. Nature. 2023. May 05, [2024-01-02]. https://www.nature.com/articles/d41586-023-01559-z . - PubMed
1. Rahimi F, Darvishi M, Bezmin Abadi AT. 'The end' - or is it? Emergence of SARS-CoV-2 EG.5 and BA.2.86 subvariants. Future Virol. 2023 Sep;18(13):823. doi: 10.2217/fvl-2023-0150. https://europepmc.org/abstract/MED/37736262 - DOI - PMC - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

LinkOut - more resources

Full Text Sources
Medical
- MedlinePlus Health Information
Research Materials
- NCI CPTC Antibody Characterization Program
Miscellaneous
- NCI CPTAC Assay Portal

[1] WHO COVID-19 dashboard. World Health Organization. [2024-01-02]. https://data.who.int/dashboards/covid19/cases?n=c .

[2] WHO COVID-19 dashboard. World Health Organization. [2024-01-02]. https://data.who.int/dashboards/covid19/cases?n=c .

[3] Gilbert A, Ghuysen A. Triage in the time of COVID-19. Lancet Digit Health. 2022 Apr;4(4):e210–e211. doi: 10.1016/s2589-7500(22)00001-2. - DOI - PMC - PubMed

[4] Gilbert A, Ghuysen A. Triage in the time of COVID-19. Lancet Digit Health. 2022 Apr;4(4):e210–e211. doi: 10.1016/s2589-7500(22)00001-2. - DOI - PMC - PubMed

[5] Aleem A, Samad ABA, Vaqar S. StatPearls [Internet] Treasure Island, FL: StatPearls Publishing; 2023. Emerging variants of SARS-CoV-2 and novel therapeutics against coronavirus (COVID-19) - PubMed

[6] Aleem A, Samad ABA, Vaqar S. StatPearls [Internet] Treasure Island, FL: StatPearls Publishing; 2023. Emerging variants of SARS-CoV-2 and novel therapeutics against coronavirus (COVID-19) - PubMed

[7] Lenharo M. WHO declares end to COVID-19's emergency phase. Nature. 2023. May 05, [2024-01-02]. https://www.nature.com/articles/d41586-023-01559-z . - PubMed

[8] Lenharo M. WHO declares end to COVID-19's emergency phase. Nature. 2023. May 05, [2024-01-02]. https://www.nature.com/articles/d41586-023-01559-z . - PubMed

[9] Rahimi F, Darvishi M, Bezmin Abadi AT. 'The end' - or is it? Emergence of SARS-CoV-2 EG.5 and BA.2.86 subvariants. Future Virol. 2023 Sep;18(13):823. doi: 10.2217/fvl-2023-0150. https://europepmc.org/abstract/MED/37736262 - DOI - PMC - PubMed

[10] Rahimi F, Darvishi M, Bezmin Abadi AT. 'The end' - or is it? Emergence of SARS-CoV-2 EG.5 and BA.2.86 subvariants. Future Virol. 2023 Sep;18(13):823. doi: 10.2217/fvl-2023-0150. https://europepmc.org/abstract/MED/37736262 - DOI - PMC - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Development and Validation of a Robust and Interpretable Early Triaging Support System for Patients Hospitalized With COVID-19: Predictive Algorithm Modeling and Interpretation Study

Affiliations

Development and Validation of a Robust and Interpretable Early Triaging Support System for Patients Hospitalized With COVID-19: Predictive Algorithm Modeling and Interpretation Study

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Medical

Research Materials

Miscellaneous

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Related information

LinkOut - more resources

Full Text Sources

Medical

Research Materials

Miscellaneous