. 2022 Sep 22;17(9):e0274171.

doi: 10.1371/journal.pone.0274171. eCollection 2022.

Machine-learning-derived predictive score for early estimation of COVID-19 mortality risk in hospitalized patients

Affiliations

¹ Multivariate Statistical Engineering Group, Department of Applied Statistics and Operational Research and Quality, Universitat Politècnica de València, València, España.
² Pharmacy Service, Hospital Universitario Dr. Peset, València, España.
³ Univ. Lille, CNRS, LASIRE - UMR 8516 - Laboratory of Advanced Spectroscopy for Interaction, Reactivity and Environmental Studies, Lille, France.
⁴ Pharmacy Service, Hospital Universitario Jerez de la Frontera, Área de Gestión Sanitaria Jerez-Costa Noroeste y Sierra de Cádiz, Jerez de la Frontera, España.
⁵ Pharmacy Service, Hospital Universitario Ramón y Cajal, Madrid, España.

PMID: 36137106
PMCID: PMC9499271
DOI: 10.1371/journal.pone.0274171

Machine-learning-derived predictive score for early estimation of COVID-19 mortality risk in hospitalized patients

Alba González-Cebrián et al. PLoS One. 2022.

. 2022 Sep 22;17(9):e0274171.

doi: 10.1371/journal.pone.0274171. eCollection 2022.

Affiliations

¹ Multivariate Statistical Engineering Group, Department of Applied Statistics and Operational Research and Quality, Universitat Politècnica de València, València, España.
² Pharmacy Service, Hospital Universitario Dr. Peset, València, España.
³ Univ. Lille, CNRS, LASIRE - UMR 8516 - Laboratory of Advanced Spectroscopy for Interaction, Reactivity and Environmental Studies, Lille, France.
⁴ Pharmacy Service, Hospital Universitario Jerez de la Frontera, Área de Gestión Sanitaria Jerez-Costa Noroeste y Sierra de Cádiz, Jerez de la Frontera, España.
⁵ Pharmacy Service, Hospital Universitario Ramón y Cajal, Madrid, España.

PMID: 36137106
PMCID: PMC9499271
DOI: 10.1371/journal.pone.0274171

Abstract

The clinical course of COVID-19 is highly variable. It is therefore essential to predict as early and accurately as possible the severity level of the disease in a COVID-19 patient who is admitted to the hospital. This means identifying the contributing factors of mortality and developing an easy-to-use score that could enable a fast assessment of the mortality risk using only information recorded at the hospitalization. A large database of adult patients with a confirmed diagnosis of COVID-19 (n = 15,628; with 2,846 deceased) admitted to Spanish hospitals between December 2019 and July 2020 was analyzed. By means of multiple machine learning algorithms, we developed models that could accurately predict their mortality. We used the information about classifiers' performance metrics and about importance and coherence among the predictors to define a mortality score that can be easily calculated using a minimal number of mortality predictors and yielded accurate estimates of the patient severity status. The optimal predictive model encompassed five predictors (age, oxygen saturation, platelets, lactate dehydrogenase, and creatinine) and yielded a satisfactory classification of survived and deceased patients (area under the curve: 0.8454 with validation set). These five predictors were additionally used to define a mortality score for COVID-19 patients at their hospitalization. This score is not only easy to calculate but also to interpret since it ranges from zero to eight, along with a linear increase in the mortality risk from 0% to 80%. A simple risk score based on five commonly available clinical variables of adult COVID-19 patients admitted to hospital is able to accurately discriminate their mortality probability, and its interpretation is straightforward and useful.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

**Fig 1. Flux diagram of the data used for the mortality prediction model building and validation.**
Data were stored in the REDCap storing service. The initial database (n = 15,628) was preprocessed and split into a calibration (n = 10,008) and validation (n = 2,501) subsets, without replacement. The calibration data set was used to set the optimal hyperparameters of the classifiers. The final model was chosen assessing the performance with the validation data set. LR = Logistic Regression. PLSDA = Partial Least Squares—Discriminant Analysis. kPLSDA = kernel PLSDA. RF = Random Forest.

**Fig 2. Importance metrics for all predictors.**
Median values (over the 100 re-sampling folds) of the 38 predictor coefficients sorted by type of data blocks (demographic variables, clinical variables at admission, comorbidities, pharmacological treatments for chronic conditions, analytics at admission and information about the admission event).

**Fig 3. Coherence metrics for all predictors and classifiers.**
Bar charts representing the percentage of folds in which each predictor was found to show a positive (red) or a negative coefficient (blue) for the LR model (A), the PLSDA model (B), the kPLSDA model (C), and the RF model (D). Bars with high color consistency indicate highly consistent relationships between predictors and mortality.

**Fig 4. Importance of most relevant variables.**
Ranking (in descending order) of the 18 variables selected according to their importance and to the consistency of their relationship with the mortality risk over the 100 re-sampling iterations.

**Fig 5. Assessment on the quality of the risk calibration.**
Intercept and slope of the risk calibration curve obtained for each incremental model with LR (A), PLSDA (B), kPLSDA (C) and RF (D).

**Fig 6. Optimal calibration risk prediction curves.**
Observed mortality (%) vs. predicted risk of mortality for all the classification algorithms under study at their respective optimal variable number setting. Predicted risk values were rounded to the first decimal digit, i.e., predicted value 0.1 refers to predictions between 0.05 and 0.15.

**Fig 7. Marginal distributions of predictors used by the RF.**
Violin plots (blue: alive patients; red: deceased patients) for age (A), oxygen saturation (B), platelets (C), LDH (D), and creatinine (E).

**Fig 8. Histograms with marginal distributions of final set of predictors.**
Age, oxygen saturation, platelets, LDH and creatinine distribution within alive (blue) and deceased (red) patients.

**Fig 9. Final set of scoring rules.**
Formulation of the nine-levels mortality score for COVID-19 patients at their hospital admission.

**Fig 10. Observed mortality vs. score curves.**
Observed mortality at each level of the score for the Calibration data set and for the Validation data set.

See this image and copyright information in PMC

References

1. WHO Coronavirus (COVID-19) Dashboard | WHO Coronavirus (COVID-19) Dashboard With Vaccination Data, https://covid19.who.int/,
1. Burn E, Tebé C, Fernandez-Bertolin S, Aragon M, Recalde M, Roel E, et al. The natural history of symptomatic COVID-19 during the first wave in Catalonia. Nature communications. 2021;12(1):777. doi: 10.1038/s41467-021-21100-y - DOI - PMC - PubMed
1. Gustine JN, Jones D. Immunopathology of Hyperinflammation in COVID-19. The American journal of pathology. 2021;191(1):4–17. doi: 10.1016/j.ajpath.2020.08.009 - DOI - PMC - PubMed
1. Knight SR, Ho A, Pius R, Buchan I, Carson G, Drake TM, et al. Risk stratification of patients admitted to hospital with covid-19 using the ISARIC WHO Clinical Characterisation Protocol: development and validation of the 4C Mortality Score. BMJ. 2020;370:22. doi: 10.1136/bmj.m3339 - DOI - PMC - PubMed
1. Wynants L, Van Calster B, Collins GS, Riley RD, Heinze G, Schuit E, et al. Prediction models for diagnosis and prognosis of covid-19: systematic review and critical appraisal. BMJ. 2020;369:m1328. doi: 10.1136/bmj.m1328 - DOI - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Machine-learning-derived predictive score for early estimation of COVID-19 mortality risk in hospitalized patients

Affiliations

Machine-learning-derived predictive score for early estimation of COVID-19 mortality risk in hospitalized patients

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Medical