Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jul 28:15:1605485.
doi: 10.3389/fcimb.2025.1605485. eCollection 2025.

Prediction of bacteremia using routine hematological and metabolic parameters based on logistic regression and random forest models

Affiliations

Prediction of bacteremia using routine hematological and metabolic parameters based on logistic regression and random forest models

Ting-Qiang Wang et al. Front Cell Infect Microbiol. .

Abstract

Background: This study aimed to evaluate the predictive utility of routine hematological, inflammatory, and metabolic markers for bacteremia and to compare the classification performance of logistic regression and random forest models.

Methods: A retrospective study was conducted on 287 inpatients who underwent blood culture testing at Fuding Hospital, Fujian University of Traditional Chinese Medicine between March and August 2024. Patients were divided into bacteremia (n = 137) and non-bacteremia (n = 150) groups based on blood culture results. Hematological indices, inflammatory markers (e.g., C-reactive protein (CRP), procalcitonin (PCT)), metabolic indices (e.g., glucose, cholesterol) and nutritional markers (e.g., albumin) were analyzed. Univariate and multivariate binary logistic regression analyses were used to identify independent risk factors. Logistic regression and random forest models were developed using 33 features with a 70:30 train-test split and evaluated using the receiver operating characteristic (ROC) curves, confusion matrices and standard classification.

Results: Hemoglobin, cholesterol, and albumin levels were significantly lower in the bacteremia group, while platelet count, CRP, PCT, glucose, and triglycerides were significantly elevated (all p < 0.05). Logistic regression identified platelet count (Odds ratios (OR) = 1.003, 95% confidence interval (CI): 1.001-1.006), PCT (OR = 1.032, 95% CI: 1.004-1.060), triglycerides (OR = 1.740, 95% CI: 1.052-2.879), and low cholesterol (OR = 0.523, 95% CI: 0.383-0.714) as independent risk factors. The area under the ROC curve (AUC) was 0.75 for the random forest model and 0.74 for logistic regression, with recall rates of 0.69 and 0.60, respectively.

Conclusion: Routine laboratory markers integrated into machine learning models demonstrated potential for early bacteremia prediction. Random forest exhibited superior sensitivity compared to logistic regression, suggesting its potential utility as a clinical screening tool.

Keywords: bacteremia; biomarkers; blood culture; logistic regression; machine learning; random forest.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

Figure 1
Figure 1
Confusion matrices of logistic regression and random forest models. This figure illustrates the confusion matrices for the logistic regression model (left) and the random forest model (right) on the test dataset. Compared with logistic regression, the random forest model achieved a slightly higher number of true positives (TP = 29 vs. 25) and fewer false negatives (FN = 13 vs. 17), indicating improved sensitivity. However, the random forest model also showed a modest increase in false positives (FP = 11 vs. 10), suggesting a slight reduction in specificity as a trade-off for higher sensitivity.
Figure 2
Figure 2
Comparison of ROC curves between logistic regression and random forest models. The ROC curves of the two models exhibit similar shapes, indicating that logistic regression and random forest achieved comparable classification performance on this dataset.

Similar articles

References

    1. Agnello L., Giglio R. V., Bivona G., Scazzone C., Gambino C. M., Iacona A., et al. (2021). The value of a complete blood count (CBC) for sepsis diagnosis and prognosis. Diagnostics (Basel) 11. doi: 10.3390/diagnostics11101881, PMID: - DOI - PMC - PubMed
    1. Agnello L., Vidali M., Padoan A., Lucis R., Mancini A., Guerranti R., et al. (2024). Machine learning algorithms in sepsis. Clinica Chimica Acta 553, 117738. doi: 10.1016/j.cca.2023.117738, PMID: - DOI - PubMed
    1. Allison S. P., Lobo D. N. (2024). The clinical significance of hypoalbuminaemia. Clin. Nutr. 43, 909–914. doi: 10.1016/j.clnu.2024.02.018, PMID: - DOI - PubMed
    1. Chua M. T., Boon Y., Lee Z. Y., Kok J. H. J., Lim C. K. W., Cheung N. M. T., et al. (2025). The role of artificial intelligence in sepsis in the Emergency Department: a narrative review. Ann. Transl. Med. 13, 4. doi: 10.21037/atm-24-150, PMID: - DOI - PMC - PubMed
    1. Evans L., Rhodes A., Alhazzani W., Antonelli M., Coopersmith C. M., French C., et al. (2021). Prescott HC et al: Surviving Sepsis Campaign: International Guidelines for Management of Sepsis and Septic Shock 2021. Crit. Care Med. 49, e1063–e1143. doi: 10.1097/CCM.0000000000005337, PMID: - DOI - PubMed

LinkOut - more resources