Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 May 20;15(1):17478.
doi: 10.1038/s41598-025-01821-6.

Prediction of bloodstream infection using machine learning based primarily on biochemical data

Affiliations

Prediction of bloodstream infection using machine learning based primarily on biochemical data

Ramtin Zargari Marandi et al. Sci Rep. .

Abstract

Early diagnosis of bloodstream infection (BSI) is crucial for informed antibiotic use. This study developed a machine learning approach for early BSI detection using a comprehensive dataset from Rigshospitalet, Denmark (2010-2020). The dataset included 144,398 samples from adult patients, containing blood culture results, demographics, and up to 36 biochemical variables. Positive blood culture was observed in 6.4% of samples, mostly caused by Staphylococcus aureus, Escherichia coli, and Enterococcus faecium. 80% of the samples (N = 43,351 patients) were used for ML model development and five-fold cross-validation, with 20% for independent testing (N = 10,837). Among seven models, LightGBM performed best, achieving an AUC of 0.69 on the test set. It was more accurate in detecting negatives, with a negative predictive value (NPV) of 0.96 and specificity of 0.74, compared to a positive predictive value (PPV) of 0.13 and sensitivity of 0.54. SHapley Additive exPlanations (SHAP) identified platelets, leukocytes, and neutrophils-to-lymphocytes as the top-3 predictive features. The model showed higher sensitivity (average 0.66) for common pathogens, e.g., 0.71 for E. coli. Results highlight the potential of biochemical variables as diagnostic factors for BSI, indicating clinical use to focus on identifying patients at low risks and can be further enhanced in future investigations.

Keywords: Artificial intelligence; Biomarkers; Classification models; Clinical utility; Diagnosis; Electronic medical records; Infection management; Interpretability; Real-world data.

PubMed Disclaimer

Conflict of interest statement

Declarations. Competing interests: The authors declare no competing interests. Ethical approval and considerations.: There were no ethical concerns for this study. This was a retrospective, non-intervention study. All investigations were performed retrospectively and on anonymized data in accordance with Danish guidelines and regulations. No human tissue was stored or used. All methods were carried out in accordance with relevant guidelines and regulations and the study approved by relevant agencies (see below). Due to the retrospective nature of the study, The Scientific Ethics Committee for the Capital Region of Denmark waived the need of obtaining informed consent. The extraction of data from registries was approved by the local data protection agency (Journal-nr.: R-21005123) and approved by the Danish Patient Safety Authority (journal no. 2021 -328).

Figures

Fig. 1
Fig. 1
Overview of the LightGBM model performance on the test set. (a) Receiver operating characteristics (ROC) and precision-recall curves and confusion matrix for the LightGBM model on the test set where “BSI” denotes confirmed positive blood culture. The numbers in the confusion matrix indicate the number of samples from the test set. (b) Prediction performance of the model, (c) model sensitivity in prediction of top 10 most common pathogens found in the test set.
Fig. 2
Fig. 2
SHAP summary plot for the best performing model (LightGBM) from 28,879 predictions on the test set: x-axis is the SHAP value, y-axis is the list of top-20 most contributing features (variables) sorted by their mean absolute SHAP values. Featue values are color coded. Each point indicates an instance (sample). Missing values are indicated by the points in grey color.
Fig. 3
Fig. 3
Overview of the analytical pipeline used in this study from data collection and aggregation to machine learning and interpretation.

Similar articles

References

    1. Kontula, K. S. K., Skogberg, K., Ollgren, J., Järvinen, A. & Lyytikäinen, O. Early deaths in bloodstream infections: a population-based case series. Infect. Dis. 48 (5), 379–385 (2016). - PubMed
    1. Papadimitriou-Olivgeris, M. et al. Predictors for delayed antibiotic administration among bacteraemic patients in the emergency department: differences between medical and surgical interns. Eur. J. Clin. Investig. 50 (11), 1–7 (2020). - PubMed
    1. Townsend, S. R. Antibiotic administration and timing: risks, delay, zombies**. Crit. Care Med. 49 (10), 1818–1821 (2021). - PubMed
    1. Arabestani, M., Rastiany, S., Kazemi, S. & Mousavi, S. Conventional, molecular methods and biomarkers molecules in detection of septicemia. Adv. Biomed. Res.4 (1), 120 (2015). - PMC - PubMed
    1. Sen, P. et al. oktober, Infection markers as predictors of bacteremia in an intensive care unit: A prospective study. Pak J. Med. Sci.34(6). (2018). - PMC - PubMed

LinkOut - more resources