A comparison of modeling approaches for static and dynamic prediction of central line-associated bloodstream infections using electronic health records (part 2): random forest models

Elena Albu¹, Shan Gao¹, Pieter Stijnen², Frank E Rademakers³, Christel Janssens⁴, Veerle Cossey^{1

5}, Yves Debaveye⁶, Laure Wynants^{1

7

8}, Ben Van Calster^{9

10}

Affiliations

¹ Department of Development & Regeneration, KU Leuven, Leuven, Belgium.
² Management Information Reporting Department, University Hospitals Leuven, Leuven, Belgium.
³ Faculty of Medicine, KU Leuven, Leuven, Belgium.
⁴ Vascular Access Specialty Team, University Hospitals Leuven, Leuven, Belgium.
⁵ Department of Infection Control and Prevention, University Hospitals Leuven, Leuven, Belgium.
⁶ Department of Cellular and Molecular Medicine, University Hospitals Leuven, Leuven, Belgium.
⁷ Leuven Unit for Health Technology Assessment Research (LUHTAR), KU Leuven, Leuven, Belgium.
⁸ School for Public Health and Primary Care, Maastricht University, Maastricht, The Netherlands.
⁹ Department of Development & Regeneration, KU Leuven, Leuven, Belgium. ben.vancalster@kuleuven.be.
¹⁰ Leuven Unit for Health Technology Assessment Research (LUHTAR), KU Leuven, Leuven, Belgium. ben.vancalster@kuleuven.be.

PMID: 40691852
PMCID: PMC12278561
DOI: 10.1186/s41512-025-00194-8

A comparison of modeling approaches for static and dynamic prediction of central line-associated bloodstream infections using electronic health records (part 2): random forest models

Elena Albu et al. Diagn Progn Res. 2025.

. 2025 Jul 21;9(1):21.

doi: 10.1186/s41512-025-00194-8.

Authors

Elena Albu¹, Shan Gao¹, Pieter Stijnen², Frank E Rademakers³, Christel Janssens⁴, Veerle Cossey^{1

5}, Yves Debaveye⁶, Laure Wynants^{1

7

8}, Ben Van Calster^{9

10}

Affiliations

¹ Department of Development & Regeneration, KU Leuven, Leuven, Belgium.
² Management Information Reporting Department, University Hospitals Leuven, Leuven, Belgium.
³ Faculty of Medicine, KU Leuven, Leuven, Belgium.
⁴ Vascular Access Specialty Team, University Hospitals Leuven, Leuven, Belgium.
⁵ Department of Infection Control and Prevention, University Hospitals Leuven, Leuven, Belgium.
⁶ Department of Cellular and Molecular Medicine, University Hospitals Leuven, Leuven, Belgium.
⁷ Leuven Unit for Health Technology Assessment Research (LUHTAR), KU Leuven, Leuven, Belgium.
⁸ School for Public Health and Primary Care, Maastricht University, Maastricht, The Netherlands.
⁹ Department of Development & Regeneration, KU Leuven, Leuven, Belgium. ben.vancalster@kuleuven.be.
¹⁰ Leuven Unit for Health Technology Assessment Research (LUHTAR), KU Leuven, Leuven, Belgium. ben.vancalster@kuleuven.be.

PMID: 40691852
PMCID: PMC12278561
DOI: 10.1186/s41512-025-00194-8

Abstract

Objective: Prognostic outcomes related to hospital admissions typically do not suffer from censoring, and can be modeled either categorically or as time-to-event. Competing events are common but often ignored. We compared the performance of static and dynamic random forest (RF) models to predict the risk of central line-associated bloodstream infections (CLABSI) using different outcome operationalizations.

Methods: We included data from 27,478 admissions to the University Hospitals Leuven, covering 30,862 catheter episodes (970 CLABSI, 1466 deaths and 28,426 discharges) to build static and dynamic RF models for binary (CLABSI vs no CLABSI), multinomial (CLABSI, discharge, death or no event), survival (time to CLABSI) and competing risks (time to CLABSI, discharge or death) outcomes to predict the 7-day CLABSI risk. Static models used information at the onset of the catheter episode, while dynamic models updated predictions daily for 30 days (landmark 0-30). We evaluated model performance across 100 train/test splits.

Results: Performance of binary, multinomial and competing risks models was similar: AUROC was 0.74 for predictions at catheter onset, rose to 0.77 for predictions at landmark 5, and decreased thereafter. Survival models overestimated the risk of CLABSI (E:O ratios between 1.2 and 1.6), and had AUROCs about 0.01 lower than other models. Binary and multinomial models had lowest computation times. Models including multiple outcome events (multinomial and competing risks) display a different internal structure compared to binary and survival models, choosing different variables for early splits in trees.

Discussion and conclusion: In the absence of censoring, complex modelling choices do not considerably improve the predictive performance compared to a binary model for CLABSI prediction in our studied settings. Survival models censoring the competing events at their time of occurrence should be avoided.

Keywords: CLABSI; Competing risks; Dynamic prediction; EHR; Random forests; Survival.

PubMed Disclaimer

Conflict of interest statement

Declarations. Ethics approval and consent to participate: The study was approved by the Ethics Committee Research UZ/KU Leuven (EC Research, https://admin.kuleuven.be/raden/en/ethics-committee-research-uz-kuleuven# ) on 19 January 2022 (S60891). The Ethics Committee Research UZ/KU Leuven waived the need to obtain informed consent from participants. All patient identifiers were coded using the pseudo-identifier in the data warehouse by the Management Information Reporting Department of UZ Leuven, according to the General Data Protection Regulation (GDPR). Consent for publication: Not applicable. Competing interests: The authors declare that they have no competing interests.

Figures

**Fig. 3**
Prediction performance for static models. From left to right, separated by vertical lines: binary outcome model, multinomial outcome model, survival models, competing risk models. Models that consider all outcome classes to determine splits are displayed in blue, models that only consider CLABSI to determine splits are displayed in red

**Fig. 4**
Feature minimal depth for static (baseline) and dynamic models. Includes only a subset of “important” features for which the median minimal depth is less than 2 in at least one model type. Lower minimal depths indicate more important variables. The minimal depth for all features is included in Supplementary material 7. Models that consider all outcome classes to determine splits are in blue, models that only consider CLABSI to determine splits are in red

**Fig. 5**
Prediction performance for dynamic models–time dependent metrics. The median value of each metric is plotted over time (landmark) and the vertical bars indicate the IQR

**Fig. 6**
Runtimes for static (baseline) and dynamic models

See this image and copyright information in PMC

References

1. Moor M, Rieck B, Horn M, Jutzeler CR, Borgward K. Early prediction of sepsis in the ICU using machine learning: a systematic review. Front Med. 2021;8:607952. - PMC - PubMed
1. Fleuren LM, Klausch TL, Zwager CL, Schoonmade LJ, Guo T, Roggeveen LF, et al. Machine learning for the prediction of sepsis: a systematic review and meta-analysis of diagnostic test accuracy. Intensive Care Med. 2020;46:383–400. - PMC - PubMed
1. Yan MY, Gustad LT, Nytrø Ø. Sepsis prediction, early detection, and identification using clinical text for machine learning: a systematic review. J Am Med Inform Assoc. 2022;29(3):559–75. - PMC - PubMed
1. Deng HF, Sun MW, Wang Y, Zeng J, Yuan T, Li T, et al. Evaluating machine learning models for sepsis prediction: A systematic review of methodologies. Iscience. 2022;25(1). - PMC - PubMed
1. Frondelius T, Atkova I, Miettunen J, Rello J, Vesty G, Chew HSJ, et al. Early prediction of ventilator-associated pneumonia with machine learning models: A systematic review and meta-analysis of prediction model performance. Eur J Intern Med. 2024;121:76-87. - PubMed

Grants and funding

LinkOut - more resources

Full Text Sources
- PubMed Central

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

A comparison of modeling approaches for static and dynamic prediction of central line-associated bloodstream infections using electronic health records (part 2): random forest models

Affiliations

A comparison of modeling approaches for static and dynamic prediction of central line-associated bloodstream infections using electronic health records (part 2): random forest models

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Grants and funding

LinkOut - more resources

Full Text Sources