The Development and Validation of Simplified Machine Learning Algorithms to Predict Prognosis of Hospitalized Patients With COVID-19: Multicenter, Retrospective Study

doi:10.2196/31549

Multicenter Study

. 2022 Jan 21;24(1):e31549.

doi: 10.2196/31549.

The Development and Validation of Simplified Machine Learning Algorithms to Predict Prognosis of Hospitalized Patients With COVID-19: Multicenter, Retrospective Study

Fang He^{1

2}, John H Page³, Kerry R Weinberg², Anirban Mishra²

Affiliations

¹ Amgen Inc, Center for Observational Research, South San Francisco, CA, United States.
² Amgen Inc, Digital Health & Innovation, Thousand Oaks, CA, United States.
³ Amgen Inc, Center for Observational Research, Thousand Oaks, CA, United States.

PMID: 34951865
PMCID: PMC8785956
DOI: 10.2196/31549

Multicenter Study

The Development and Validation of Simplified Machine Learning Algorithms to Predict Prognosis of Hospitalized Patients With COVID-19: Multicenter, Retrospective Study

Fang He et al. J Med Internet Res. 2022.

. 2022 Jan 21;24(1):e31549.

doi: 10.2196/31549.

Authors

Fang He^{1

2}, John H Page³, Kerry R Weinberg², Anirban Mishra²

Affiliations

¹ Amgen Inc, Center for Observational Research, South San Francisco, CA, United States.
² Amgen Inc, Digital Health & Innovation, Thousand Oaks, CA, United States.
³ Amgen Inc, Center for Observational Research, Thousand Oaks, CA, United States.

PMID: 34951865
PMCID: PMC8785956
DOI: 10.2196/31549

Abstract

Background: The current COVID-19 pandemic is unprecedented; under resource-constrained settings, predictive algorithms can help to stratify disease severity, alerting physicians of high-risk patients; however, there are only few risk scores derived from a substantially large electronic health record (EHR) data set, using simplified predictors as input.

Objective: The objectives of this study were to develop and validate simplified machine learning algorithms that predict COVID-19 adverse outcomes; to evaluate the area under the receiver operating characteristic curve (AUC), sensitivity, specificity, and calibration of the algorithms; and to derive clinically meaningful thresholds.

Methods: We performed machine learning model development and validation via a cohort study using multicenter, patient-level, longitudinal EHRs from the Optum COVID-19 database that provides anonymized, longitudinal EHR from across the United States. The models were developed based on clinical characteristics to predict 28-day in-hospital mortality, intensive care unit (ICU) admission, respiratory failure, and mechanical ventilator usages at inpatient setting. Data from patients who were admitted from February 1, 2020, to September 7, 2020, were randomly sampled into development, validation, and test data sets; data collected from September 7, 2020, to November 15, 2020, were reserved as the postdevelopment prospective test data set.

Results: Of the 3.7 million patients in the analysis, 585,867 patients were diagnosed or tested positive for SARS-CoV-2, and 50,703 adult patients were hospitalized with COVID-19 between February 1 and November 15, 2020. Among the study cohort (n=50,703), there were 6204 deaths, 9564 ICU admissions, 6478 mechanically ventilated or EMCO patients, and 25,169 patients developed acute respiratory distress syndrome or respiratory failure within 28 days since hospital admission. The algorithms demonstrated high accuracy (AUC 0.89, 95% CI 0.89-0.89 on the test data set [n=10,752]), consistent prediction through the second wave of the pandemic from September to November (AUC 0.85, 95% CI 0.85-0.86) on the postdevelopment prospective test data set [n=14,863], great clinical relevance, and utility. Besides, a comprehensive set of 386 input covariates from baseline or at admission were included in the analysis; the end-to-end pipeline automates feature selection and model development. The parsimonious model with only 10 input predictors produced comparably accurate predictions; these 10 predictors (age, blood urea nitrogen, SpO₂, systolic and diastolic blood pressures, respiration rate, pulse, temperature, albumin, and major cognitive disorder excluding stroke) are commonly measured and concordant with recognized risk factors for COVID-19.

Conclusions: The systematic approach and rigorous validation demonstrate consistent model performance to predict even beyond the period of data collection, with satisfactory discriminatory power and great clinical utility. Overall, the study offers an accurate, validated, and reliable prediction model based on only 10 clinical features as a prognostic tool to stratifying patients with COVID-19 into intermediate-, high-, and very high-risk groups. This simple predictive tool is shared with a wider health care community, to enable service as an early warning system to alert physicians of possible high-risk patients, or as a resource triaging tool to optimize health care resources.

Keywords: COVID-19; machine learning; predictive algorithm; prognostic model.

©Fang He, John H Page, Kerry R Weinberg, Anirban Mishra. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 21.01.2022.

PubMed Disclaimer

Conflict of interest statement

Conflicts of Interest: FH, JHP, and AM are employees and stockholders of Amgen, Inc. KRW, an employee of League Inc, was formerly an employee of Amgen, Inc and owns stock in Amgen, Inc.

Figures

**Figure 1**
Patient attrition diagram. ^∧With relevant COVID-19 diagnosis codes or tested positive for SARS-CoV-2. *Non-exclusive critera: overlapping was allowed.

**Figure 2**
Model development and validation framework including data sampling and corresponding sensitivity analyses.

**Figure 3**
Receiver operating characteristics (AUROC) curves on four prediction outcomes in final analysis: (a) all-cause mortality; (b) respiratory failure including ARDS; (c) ICU admission; (d) invasive mechanical ventilation including ECMO. Full model is colored in black, parsimonious model with ten input variables is colored in orange. Solid line represents model performance on test dataset (n=10,752); dashed line represents post-development prospective test dataset (n=14,863). ARDS: acute respiratory distress syndrome. ECMO: extracorporeal membrane oxygenation.

**Figure 4**
Calibration curve (number of bins = 10) on four prediction outcomes in final analysis: (a) all-cause mortality; (b) respiratory failure including ARDS; (c) ICU admission; (d) invasive mechanical ventilation including ECMO. Full model is colored in black, parsimonious model with ten input variables is colored in orange. Solid line represents calibration on test dataset (n=10,752); dashed line represents calibration on post-development prospective test dataset (n=14,863). ARDS: acute respiratory distress syndrome. ECMO: extracorporeal membrane oxygenation.

**Figure 5**
Decision curve analysis of standardized net benefit across different risk thresholds. Dotted line represents the scenario if everyone is treated; dashed line represents the scenario if none is treated.

See this image and copyright information in PMC

Cited by

In-depth analysis of the risk factors for persistent severe acute respiratory syndrome coronavirus 2 infection and construction of predictive models: an exploratory research study.
Zhang J, Zhu W, Jiang P, Ma F, Li Y, Cao Y, Li J, Zhang Z, Zhang X, Zou W, Chen J. Zhang J, et al. BMC Infect Dis. 2025 May 14;25(1):699. doi: 10.1186/s12879-025-11083-2. BMC Infect Dis. 2025. PMID: 40369416 Free PMC article.
Fib-4 score is able to predict intra-hospital mortality in 4 different SARS-COV2 waves.
Miele L, Dajko M, Savino MC, Capocchiano ND, Calvez V, Liguori A, Masciocchi C, Vetrone L, Mignini I, Schepis T, Marrone G, Biolato M, Cesario A, Patarnello S, Damiani A, Grieco A, Valentini V, Gasbarrini A; Gemelli against COVID Group. Miele L, et al. Intern Emerg Med. 2023 Aug;18(5):1415-1427. doi: 10.1007/s11739-023-03310-y. Epub 2023 Jul 25. Intern Emerg Med. 2023. PMID: 37491564 Free PMC article.
Analysis of Publication Activity and Research Trends in the Field of AI Medical Applications: Network Approach.
Karpov OE, Pitsik EN, Kurkin SA, Maksimenko VA, Gusev AV, Shusharina NN, Hramov AE. Karpov OE, et al. Int J Environ Res Public Health. 2023 Mar 30;20(7):5335. doi: 10.3390/ijerph20075335. Int J Environ Res Public Health. 2023. PMID: 37047950 Free PMC article.
Unraveling complex relationships between COVID-19 risk factors using machine learning based models for predicting mortality of hospitalized patients and identification of high-risk group: a large retrospective study.
Banoei MM, Rafiepoor H, Zendehdel K, Seyyedsalehi MS, Nahvijou A, Allameh F, Amanpour S. Banoei MM, et al. Front Med (Lausanne). 2023 May 4;10:1170331. doi: 10.3389/fmed.2023.1170331. eCollection 2023. Front Med (Lausanne). 2023. PMID: 37215714 Free PMC article.
Assessing the impact of vaccines on COVID-19 efficacy in survival rates: a survival analysis approach for clinical decision support.
González Rodríguez JL, Oprescu AM, Muñoz Lezcano S, Cordero Ramos J, Romero Cabrera JL, Armengol de la Hoz MÁ, Estella Á. González Rodríguez JL, et al. Front Public Health. 2024 Nov 18;12:1437388. doi: 10.3389/fpubh.2024.1437388. eCollection 2024. Front Public Health. 2024. PMID: 39624415 Free PMC article.

See all "Cited by" articles

References

1. Knight SR, Ho A, Pius R, Buchan I, Carson G, Drake TM, Dunning J, Fairfield CJ, Gamble C, Green CA, Gupta R, Halpin S, Hardwick HE, Holden KA, Horby PW, Jackson C, Mclean KA, Merson L, Nguyen-Van-Tam JS, Norman L, Noursadeghi M, Olliaro PL, Pritchard MG, Russell CD, Shaw CA, Sheikh A, Solomon T, Sudlow C, Swann OV, Turtle LC, Openshaw PJ, Baillie JK, Semple MG, Docherty AB, Harrison EM, ISARIC4C investigators Risk stratification of patients admitted to hospital with covid-19 using the ISARIC WHO Clinical Characterisation Protocol: development and validation of the 4C Mortality Score. BMJ. 2020 Sep 09;370:m3339. doi: 10.1136/bmj.m3339. http://www.bmj.com/lookup/pmidlookup?view=long&pmid=32907855 - DOI - PMC - PubMed
1. Liang W, Liang H, Ou L, Chen B, Chen A, Li C, Li Y, Guan W, Sang L, Lu J, Xu Y, Chen G, Guo H, Guo J, Chen Z, Zhao Y, Li S, Zhang N, Zhong N, He J, China Medical Treatment Expert Group for COVID-19 Development and Validation of a Clinical Risk Score to Predict the Occurrence of Critical Illness in Hospitalized Patients With COVID-19. JAMA Intern Med. 2020 Aug 01;180(8):1081–1089. doi: 10.1001/jamainternmed.2020.2033. http://europepmc.org/abstract/MED/32396163 2766086 - DOI - PMC - PubMed
1. Knaus WA, Draper EA, Wagner DP, Zimmerman JE. APACHE II: a severity of disease classification system. Crit Care Med. 1985 Oct;13(10):818–29. - PubMed
1. Vincent JL, Moreno R, Takala J, Willatts S, De Mendonça A, Bruining H, Reinhart CK, Suter PM, Thijs LG. The SOFA (Sepsis-related Organ Failure Assessment) score to describe organ dysfunction/failure. On behalf of the Working Group on Sepsis-Related Problems of the European Society of Intensive Care Medicine. Intensive Care Med. 1996 Jul;22(7):707–10. doi: 10.1007/BF01709751. - DOI - PubMed
1. Chen J, Chang S, Liu JJ, Chan R, Wu J, Wang W, Lee S, Lee C. Comparison of clinical characteristics and performance of pneumonia severity score and CURB-65 among younger adults, elderly and very old subjects. Thorax. 2010 Nov 21;65(11):971–7. doi: 10.1136/thx.2009.129627.65/11/971 - DOI - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Medical
- MedlinePlus Health Information
Miscellaneous
- NCI CPTAC Assay Portal

[1] Knight SR, Ho A, Pius R, Buchan I, Carson G, Drake TM, Dunning J, Fairfield CJ, Gamble C, Green CA, Gupta R, Halpin S, Hardwick HE, Holden KA, Horby PW, Jackson C, Mclean KA, Merson L, Nguyen-Van-Tam JS, Norman L, Noursadeghi M, Olliaro PL, Pritchard MG, Russell CD, Shaw CA, Sheikh A, Solomon T, Sudlow C, Swann OV, Turtle LC, Openshaw PJ, Baillie JK, Semple MG, Docherty AB, Harrison EM, ISARIC4C investigators Risk stratification of patients admitted to hospital with covid-19 using the ISARIC WHO Clinical Characterisation Protocol: development and validation of the 4C Mortality Score. BMJ. 2020 Sep 09;370:m3339. doi: 10.1136/bmj.m3339. http://www.bmj.com/lookup/pmidlookup?view=long&pmid=32907855 - DOI - PMC - PubMed

[2] Knight SR, Ho A, Pius R, Buchan I, Carson G, Drake TM, Dunning J, Fairfield CJ, Gamble C, Green CA, Gupta R, Halpin S, Hardwick HE, Holden KA, Horby PW, Jackson C, Mclean KA, Merson L, Nguyen-Van-Tam JS, Norman L, Noursadeghi M, Olliaro PL, Pritchard MG, Russell CD, Shaw CA, Sheikh A, Solomon T, Sudlow C, Swann OV, Turtle LC, Openshaw PJ, Baillie JK, Semple MG, Docherty AB, Harrison EM, ISARIC4C investigators Risk stratification of patients admitted to hospital with covid-19 using the ISARIC WHO Clinical Characterisation Protocol: development and validation of the 4C Mortality Score. BMJ. 2020 Sep 09;370:m3339. doi: 10.1136/bmj.m3339. http://www.bmj.com/lookup/pmidlookup?view=long&pmid=32907855 - DOI - PMC - PubMed

[3] Liang W, Liang H, Ou L, Chen B, Chen A, Li C, Li Y, Guan W, Sang L, Lu J, Xu Y, Chen G, Guo H, Guo J, Chen Z, Zhao Y, Li S, Zhang N, Zhong N, He J, China Medical Treatment Expert Group for COVID-19 Development and Validation of a Clinical Risk Score to Predict the Occurrence of Critical Illness in Hospitalized Patients With COVID-19. JAMA Intern Med. 2020 Aug 01;180(8):1081–1089. doi: 10.1001/jamainternmed.2020.2033. http://europepmc.org/abstract/MED/32396163 2766086 - DOI - PMC - PubMed

[4] Liang W, Liang H, Ou L, Chen B, Chen A, Li C, Li Y, Guan W, Sang L, Lu J, Xu Y, Chen G, Guo H, Guo J, Chen Z, Zhao Y, Li S, Zhang N, Zhong N, He J, China Medical Treatment Expert Group for COVID-19 Development and Validation of a Clinical Risk Score to Predict the Occurrence of Critical Illness in Hospitalized Patients With COVID-19. JAMA Intern Med. 2020 Aug 01;180(8):1081–1089. doi: 10.1001/jamainternmed.2020.2033. http://europepmc.org/abstract/MED/32396163 2766086 - DOI - PMC - PubMed

[5] Knaus WA, Draper EA, Wagner DP, Zimmerman JE. APACHE II: a severity of disease classification system. Crit Care Med. 1985 Oct;13(10):818–29. - PubMed

[6] Knaus WA, Draper EA, Wagner DP, Zimmerman JE. APACHE II: a severity of disease classification system. Crit Care Med. 1985 Oct;13(10):818–29. - PubMed

[7] Vincent JL, Moreno R, Takala J, Willatts S, De Mendonça A, Bruining H, Reinhart CK, Suter PM, Thijs LG. The SOFA (Sepsis-related Organ Failure Assessment) score to describe organ dysfunction/failure. On behalf of the Working Group on Sepsis-Related Problems of the European Society of Intensive Care Medicine. Intensive Care Med. 1996 Jul;22(7):707–10. doi: 10.1007/BF01709751. - DOI - PubMed

[8] Vincent JL, Moreno R, Takala J, Willatts S, De Mendonça A, Bruining H, Reinhart CK, Suter PM, Thijs LG. The SOFA (Sepsis-related Organ Failure Assessment) score to describe organ dysfunction/failure. On behalf of the Working Group on Sepsis-Related Problems of the European Society of Intensive Care Medicine. Intensive Care Med. 1996 Jul;22(7):707–10. doi: 10.1007/BF01709751. - DOI - PubMed

[9] Chen J, Chang S, Liu JJ, Chan R, Wu J, Wang W, Lee S, Lee C. Comparison of clinical characteristics and performance of pneumonia severity score and CURB-65 among younger adults, elderly and very old subjects. Thorax. 2010 Nov 21;65(11):971–7. doi: 10.1136/thx.2009.129627.65/11/971 - DOI - PubMed

[10] Chen J, Chang S, Liu JJ, Chan R, Wu J, Wang W, Lee S, Lee C. Comparison of clinical characteristics and performance of pneumonia severity score and CURB-65 among younger adults, elderly and very old subjects. Thorax. 2010 Nov 21;65(11):971–7. doi: 10.1136/thx.2009.129627.65/11/971 - DOI - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

The Development and Validation of Simplified Machine Learning Algorithms to Predict Prognosis of Hospitalized Patients With COVID-19: Multicenter, Retrospective Study

Affiliations

The Development and Validation of Simplified Machine Learning Algorithms to Predict Prognosis of Hospitalized Patients With COVID-19: Multicenter, Retrospective Study

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Medical

Miscellaneous

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Related information

LinkOut - more resources

Full Text Sources

Medical

Miscellaneous