Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Sep 22;17(9):e0274171.
doi: 10.1371/journal.pone.0274171. eCollection 2022.

Machine-learning-derived predictive score for early estimation of COVID-19 mortality risk in hospitalized patients

Affiliations

Machine-learning-derived predictive score for early estimation of COVID-19 mortality risk in hospitalized patients

Alba González-Cebrián et al. PLoS One. .

Abstract

The clinical course of COVID-19 is highly variable. It is therefore essential to predict as early and accurately as possible the severity level of the disease in a COVID-19 patient who is admitted to the hospital. This means identifying the contributing factors of mortality and developing an easy-to-use score that could enable a fast assessment of the mortality risk using only information recorded at the hospitalization. A large database of adult patients with a confirmed diagnosis of COVID-19 (n = 15,628; with 2,846 deceased) admitted to Spanish hospitals between December 2019 and July 2020 was analyzed. By means of multiple machine learning algorithms, we developed models that could accurately predict their mortality. We used the information about classifiers' performance metrics and about importance and coherence among the predictors to define a mortality score that can be easily calculated using a minimal number of mortality predictors and yielded accurate estimates of the patient severity status. The optimal predictive model encompassed five predictors (age, oxygen saturation, platelets, lactate dehydrogenase, and creatinine) and yielded a satisfactory classification of survived and deceased patients (area under the curve: 0.8454 with validation set). These five predictors were additionally used to define a mortality score for COVID-19 patients at their hospitalization. This score is not only easy to calculate but also to interpret since it ranges from zero to eight, along with a linear increase in the mortality risk from 0% to 80%. A simple risk score based on five commonly available clinical variables of adult COVID-19 patients admitted to hospital is able to accurately discriminate their mortality probability, and its interpretation is straightforward and useful.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Flux diagram of the data used for the mortality prediction model building and validation.
Data were stored in the REDCap storing service. The initial database (n = 15,628) was preprocessed and split into a calibration (n = 10,008) and validation (n = 2,501) subsets, without replacement. The calibration data set was used to set the optimal hyperparameters of the classifiers. The final model was chosen assessing the performance with the validation data set. LR = Logistic Regression. PLSDA = Partial Least Squares—Discriminant Analysis. kPLSDA = kernel PLSDA. RF = Random Forest.
Fig 2
Fig 2. Importance metrics for all predictors.
Median values (over the 100 re-sampling folds) of the 38 predictor coefficients sorted by type of data blocks (demographic variables, clinical variables at admission, comorbidities, pharmacological treatments for chronic conditions, analytics at admission and information about the admission event).
Fig 3
Fig 3. Coherence metrics for all predictors and classifiers.
Bar charts representing the percentage of folds in which each predictor was found to show a positive (red) or a negative coefficient (blue) for the LR model (A), the PLSDA model (B), the kPLSDA model (C), and the RF model (D). Bars with high color consistency indicate highly consistent relationships between predictors and mortality.
Fig 4
Fig 4. Importance of most relevant variables.
Ranking (in descending order) of the 18 variables selected according to their importance and to the consistency of their relationship with the mortality risk over the 100 re-sampling iterations.
Fig 5
Fig 5. Assessment on the quality of the risk calibration.
Intercept and slope of the risk calibration curve obtained for each incremental model with LR (A), PLSDA (B), kPLSDA (C) and RF (D).
Fig 6
Fig 6. Optimal calibration risk prediction curves.
Observed mortality (%) vs. predicted risk of mortality for all the classification algorithms under study at their respective optimal variable number setting. Predicted risk values were rounded to the first decimal digit, i.e., predicted value 0.1 refers to predictions between 0.05 and 0.15.
Fig 7
Fig 7. Marginal distributions of predictors used by the RF.
Violin plots (blue: alive patients; red: deceased patients) for age (A), oxygen saturation (B), platelets (C), LDH (D), and creatinine (E).
Fig 8
Fig 8. Histograms with marginal distributions of final set of predictors.
Age, oxygen saturation, platelets, LDH and creatinine distribution within alive (blue) and deceased (red) patients.
Fig 9
Fig 9. Final set of scoring rules.
Formulation of the nine-levels mortality score for COVID-19 patients at their hospital admission.
Fig 10
Fig 10. Observed mortality vs. score curves.
Observed mortality at each level of the score for the Calibration data set and for the Validation data set.

References

    1. WHO Coronavirus (COVID-19) Dashboard | WHO Coronavirus (COVID-19) Dashboard With Vaccination Data, https://covid19.who.int/,
    1. Burn E, Tebé C, Fernandez-Bertolin S, Aragon M, Recalde M, Roel E, et al. The natural history of symptomatic COVID-19 during the first wave in Catalonia. Nature communications. 2021;12(1):777. doi: 10.1038/s41467-021-21100-y - DOI - PMC - PubMed
    1. Gustine JN, Jones D. Immunopathology of Hyperinflammation in COVID-19. The American journal of pathology. 2021;191(1):4–17. doi: 10.1016/j.ajpath.2020.08.009 - DOI - PMC - PubMed
    1. Knight SR, Ho A, Pius R, Buchan I, Carson G, Drake TM, et al. Risk stratification of patients admitted to hospital with covid-19 using the ISARIC WHO Clinical Characterisation Protocol: development and validation of the 4C Mortality Score. BMJ. 2020;370:22. doi: 10.1136/bmj.m3339 - DOI - PMC - PubMed
    1. Wynants L, Van Calster B, Collins GS, Riley RD, Heinze G, Schuit E, et al. Prediction models for diagnosis and prognosis of covid-19: systematic review and critical appraisal. BMJ. 2020;369:m1328. doi: 10.1136/bmj.m1328 - DOI - PMC - PubMed

Publication types