Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Dec 26;3(12):e0000699.
doi: 10.1371/journal.pdig.0000699. eCollection 2024 Dec.

Multicenter comparative analysis of local and aggregated data training strategies in COVID-19 outcome prediction with Machine learning

Affiliations

Multicenter comparative analysis of local and aggregated data training strategies in COVID-19 outcome prediction with Machine learning

Carine Savalli et al. PLOS Digit Health. .

Abstract

Machine learning (ML) is a promising tool in assisting clinical decision-making for improving diagnosis and prognosis, especially in developing regions. It is often used with large samples, aggregating data from different regions and hospitals. However, it is unclear how this affects predictions in local centers. This study aims to compare data aggregation strategies of several hospitals in Brazil with a local training strategy in each hospital to predict two COVID-19 outcomes: Intensive Care Unit admission (ICU) and mechanical ventilation use (MV). The study included 6,046 patients from 14 hospitals, with local sample sizes ranging from 47 to 1500 patients. Machine learning models were trained using extreme gradient boosting, lightGBM, and catboost for structured data. Seven data aggregation strategies based on hospital geographic regions were compared with local training, and the best strategy was determined by analyzing the area under the ROC curve (AUROC). SHAP (Shapley Additive exPlanations) values were used to assess the contribution of variables to predictions. Additionally, a metafeatures analysis examined how hospital characteristics influence the selection of the best strategy. The study found that the local training strategy was the most effective approach, in the case of ICU outcomes, for 11 of the 14 hospitals (79%), and, in the case of MV, for 10 hospitals (71%). Metafeatures analysis suggested that hospitals with smaller sample sizes generally performed better using an aggregated data strategy compared to local training. Our study brings to light an important concern about the impact of grouping data from different hospitals in predictive machine learning models. These findings contribute to the ongoing debate about the trade-off between increasing sample size and bringing together heterogeneous scenarios.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1
Box-plots of the absolute Shapley value obtained for the 14 hospitals for (A) ICU admission and (B) mechanical ventilation use. Each graph shows the 10 variables which, on average (for the 14 hospitals), presented high contributions to predicting the outcome.

References

    1. World Health Organization. Weekly epidemiological update on COVID-19–10 August 2023. 2023. Available from: https://www.who.int/publications/m/item/weekly-epidemiological-update-on...
    1. Fernandes FT, de Oliveira TA, Teixeira CE, Batista AFM, Costa GD, Chiavegatto Filho ASP. A multipurpose machine learning approach to predict COVID-19 negative prognosis in São Paulo, Brazil. Sci Rep. 2021; 11:3343. doi: 10.1038/s41598-021-82885-y - DOI - PMC - PubMed
    1. Chieregato M, Frangiamore F, Morassi M, Baresi C, Nici S, Bassetti C, et al.. A hybrid machine learning/deep learning COVID-19 severity predictive model from CT images and clinical data. Sci Rep. 2022; 12:4329. doi: 10.1038/s41598-022-07890-1 - DOI - PMC - PubMed
    1. Sperrin M, McMillan B. Prediction models for covid-19 outcomes. BMJ. 2020; 371:m3777. doi: 10.1136/bmj.m3777 - DOI - PMC - PubMed
    1. Chen R, Chen J, Yang S, Luo S, Xiao Z, Lu L, et al.. Prediction of prognosis in COVID-19 patients using machine learning: A systematic review and meta-analysis. Int J Med Inform. 2023; 177:105151. doi: 10.1016/j.ijmedinf.2023.105151 - DOI - PubMed

LinkOut - more resources