Comparison between traditional logistic regression and machine learning for predicting mortality in adult sepsis patients

doi:10.3389/fmed.2024.1496869

. 2025 Jan 6:11:1496869.

doi: 10.3389/fmed.2024.1496869. eCollection 2024.

Comparison between traditional logistic regression and machine learning for predicting mortality in adult sepsis patients

Hongsheng Wu¹, Biling Liao¹, Tengfei Ji¹, Keqiang Ma¹, Yumei Luo¹, Shengmin Zhang¹

Affiliations

PMID: 39835102
PMCID: PMC11743956
DOI: 10.3389/fmed.2024.1496869

Comparison between traditional logistic regression and machine learning for predicting mortality in adult sepsis patients

Hongsheng Wu et al. Front Med (Lausanne). 2025.

. 2025 Jan 6:11:1496869.

doi: 10.3389/fmed.2024.1496869. eCollection 2024.

Authors

Hongsheng Wu¹, Biling Liao¹, Tengfei Ji¹, Keqiang Ma¹, Yumei Luo¹, Shengmin Zhang¹

Affiliation

¹ Hepatobiliary Pancreatic Surgery Department, Huadu District People's Hospital of Guangzhou, Guangzhou, China.

PMID: 39835102
PMCID: PMC11743956
DOI: 10.3389/fmed.2024.1496869

Abstract

Background: Sepsis is a life-threatening disease associated with a high mortality rate, emphasizing the need for the exploration of novel models to predict the prognosis of this patient population. This study compared the performance of traditional logistic regression and machine learning models in predicting adult sepsis mortality.

Objective: To develop an optimum model for predicting the mortality of adult sepsis patients based on comparing traditional logistic regression and machine learning methodology.

Methods: Retrospective analysis was conducted on 606 adult sepsis inpatients at our medical center between January 2020 and December 2022, who were randomly divided into training and validation sets in a 7:3 ratio. Traditional logistic regression and machine learning methods were employed to assess the predictive ability of mortality in adult sepsis. Univariate analysis identified independent risk factors for the logistic regression model, while Least Absolute Shrinkage and Selection Operator (LASSO) regression facilitated variable shrinkage and selection for the machine learning model. Among various machine learning models, which included Bagged Tree, Boost Tree, Decision Tree, LightGBM, Naïve Bayes, Nearest Neighbors, Support Vector Machine (SVM), and Random Forest (RF), the one with the maximum area under the curve (AUC) was chosen for model construction. Model validation and comparison with the Sequential Organ Failure Assessment (SOFA) and the Acute Physiology and Chronic Health Evaluation (APACHE) scores were performed using receiver operating characteristic (ROC) curves, calibration curves, and decision curve analysis (DCA) curves in the validation set.

Results: Univariate analysis was employed to assess 17 variables, namely gender, history of coronary heart disease (CHD), systolic pressure, white blood cell (WBC), neutrophil count (NEUT), lymphocyte count (LYMP), lactic acid, neutrophil-to-lymphocyte ratio (NLR), red blood cell distribution width (RDW), interleukin-6 (IL-6), prothrombin time (PT), international normalized ratio (INR), fibrinogen (FBI), D-dimer, aspartate aminotransferase (AST), total bilirubin (Tbil), and lung infection. Significant differences (p < 0.05) between the survival and non-survival groups were observed for these variables. Utilizing stepwise regression with the "backward" method, independent risk factors, including systolic pressure, lactic acid, NLR, RDW, IL-6, PT, and Tbil, were identified. These factors were then incorporated into a logistic regression model, chosen based on the minimum Akaike Information Criterion (AIC) value (98.65). Machine learning techniques were also applied, and the RF model, demonstrating the maximum Area Under the Curve (AUC) of 0.999, was selected. LASSO regression, employing the lambda.1SE criteria, identified systolic pressure, lactic acid, NEUT, RDW, IL6, INR, and Tbil as variables for constructing the RF model, validated through ten-fold cross-validation. For model validation and comparison with traditional logistic models, SOFA, and APACHE scoring.

Conclusion: Based on deep machine learning principles, the RF model demonstrates advantages over traditional logistic regression models in predicting adult sepsis prognosis. The RF model holds significant potential for clinical surveillance and interventions to enhance outcomes for sepsis patients.

Keywords: adult sepsis; logistic regression; machine learning; mortality; random forest.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

**Figure 1**
Flowchart illustrating the research design.

**Figure 2**
Variable shrinkage and selection by LASSO regression. **(A)** Shrinkage pathway of LASSO regression. **(B)** Based on ten-fold cross-validation, seven variables, including systolic pressure, lactic acid, NEUT, RDW, IL6, INR, and Tbil, were chosen using the lambda.1SE criteria.

**Figure 3**
Error rate chart of RF model. As the iteration reached 141 decision trees, the error rates of both out-of-bag (OOB) and model classification showed a noticeable decrease, eventually reaching a steady state.

**Figure 4**
Comparison of discriminative ability among RF, logistic regression, SOFA, and APACHE scoring system. **(A)** Training set; **(B)** validation set. The blue solid ROC curves with the largest AUC values both in training set and validation set represented that RF associated with the best discrimination among the four models. AUC, area under curve; SOFA, sequential organ failure assessment scoring; APACHE, acute physiology and chronic health evaluation scoring.

**Figure 5**
Comparison of calibration curves among RF, logistic regression, SOFA, and APACHE scoring system. **(A)** Training set; **(B)** validation set. The blue solid calibration curves which were notably closer to the ideal reference line both in training set and validation set represented that RF associated with the best goodness-of-fit and accuracy of prediction among the four models. SOFA, sequential organ failure assessment scoring; APACHE, acute physiology and chronic health evaluation scoring. The left *x-axis* represents the observed probability; the right x-axis represents the sample size, y-axis represents the predicted probability.

**Figure 6**
Comparison of decision curve analysis among RF, logistic regression, SOFA, and APACHE scoring system. **(A)** Training set; **(B)** validation set. With the highest value of AUDC and net benefit both in training set and validation set, RF was considered as the optimum model which associated with the best clinical practicality. SOFA, sequential organ failure assessment scoring; APACHE, acute physiology and chronic evaluation scoring. AUDC, area under DCA curve.

See this image and copyright information in PMC

Cited by

Red cell distribution width and clinical outcomes in sepsis patients infected with Escherichia coli using data from MIMIC-IV.
Qin K, Su Y, Ding N. Qin K, et al. Eur J Med Res. 2025 Jul 5;30(1):580. doi: 10.1186/s40001-025-02756-4. Eur J Med Res. 2025. PMID: 40618111 Free PMC article.
The Application of Machine Learning Algorithms to Predict HIV Testing Using Evidence from the 2002-2017 South African Adult Population-Based Surveys: An HIV Testing Predictive Model.
Jaiteh M, Phalane E, Shiferaw YA, Jallow H, Phaswana-Mafuya RN. Jaiteh M, et al. Trop Med Infect Dis. 2025 Jun 14;10(6):167. doi: 10.3390/tropicalmed10060167. Trop Med Infect Dis. 2025. PMID: 40559734 Free PMC article.
Developing a Predictive Model for Significant Prostate Cancer Detection in Prostatic Biopsies from Seven Clinical Variables: Is Machine Learning Superior to Logistic Regression?
Morote J, Miró B, Hernando P, Paesano N, Picola N, Muñoz-Rodriguez J, Ruiz-Plazas X, Muñoz-Rivero MV, Celma A, García-de Manuel G, Servian P, Abascal JM, Trilla E, Méndez O. Morote J, et al. Cancers (Basel). 2025 Mar 25;17(7):1101. doi: 10.3390/cancers17071101. Cancers (Basel). 2025. PMID: 40227611 Free PMC article.

References

1. Singer M, Deutschman CS, Seymour CW, Shankar-Hari M, Annane D, Bauer M, et al. . The third international consensus definitions for Sepsis and septic shock (Sepsis-3). JAMA. (2016) 315:801–10. doi: 10.1001/jama.2016.0287, PMID: - DOI - PMC - PubMed
1. Esposito S, De Simone G, Boccia G, De Caro F, Pagliano P. Sepsis and septic shock: new definitions, new diagnostic and therapeutic approaches. J Glob Antimicrob Resist. (2017) 10:204–12. doi: 10.1016/j.jgar.2017.06.013, PMID: - DOI - PubMed
1. Chiu C, Legrand M. Epidemiology of sepsis and septic shock. Curr Opin Anaesthesiol. (2021) 34:71–6. doi: 10.1097/ACO.0000000000000958, PMID: - DOI - PubMed
1. Liu Z, Meng Z, Li Y, Zhao J, Wu S, Gou S, et al. . Prognostic accuracy of the serum lactate level, the SOFA score and the qSOFA score for mortality among adults with Sepsis. Scand J Trauma Resusc Emerg Med. (2019) 27:51. doi: 10.1186/s13049-019-0609-3, PMID: - DOI - PMC - PubMed
1. Raith EP, Udy AA, Bailey M, McGloughlin S, MacIsaac C, Bellomo R, et al. . Prognostic accuracy of the SOFA score, SIRS criteria, and qSOFA score for in-hospital mortality among adults with suspected infection admitted to the intensive care unit. JAMA. (2017) 317:290–300. doi: 10.1001/jama.2016.20328 - DOI - PubMed

LinkOut - more resources

Full Text Sources
- Frontiers Media SA
- PubMed Central

[1] Singer M, Deutschman CS, Seymour CW, Shankar-Hari M, Annane D, Bauer M, et al. . The third international consensus definitions for Sepsis and septic shock (Sepsis-3). JAMA. (2016) 315:801–10. doi: 10.1001/jama.2016.0287, PMID: - DOI - PMC - PubMed

[2] Singer M, Deutschman CS, Seymour CW, Shankar-Hari M, Annane D, Bauer M, et al. . The third international consensus definitions for Sepsis and septic shock (Sepsis-3). JAMA. (2016) 315:801–10. doi: 10.1001/jama.2016.0287, PMID: - DOI - PMC - PubMed

[3] Esposito S, De Simone G, Boccia G, De Caro F, Pagliano P. Sepsis and septic shock: new definitions, new diagnostic and therapeutic approaches. J Glob Antimicrob Resist. (2017) 10:204–12. doi: 10.1016/j.jgar.2017.06.013, PMID: - DOI - PubMed

[4] Esposito S, De Simone G, Boccia G, De Caro F, Pagliano P. Sepsis and septic shock: new definitions, new diagnostic and therapeutic approaches. J Glob Antimicrob Resist. (2017) 10:204–12. doi: 10.1016/j.jgar.2017.06.013, PMID: - DOI - PubMed

[5] Chiu C, Legrand M. Epidemiology of sepsis and septic shock. Curr Opin Anaesthesiol. (2021) 34:71–6. doi: 10.1097/ACO.0000000000000958, PMID: - DOI - PubMed

[6] Chiu C, Legrand M. Epidemiology of sepsis and septic shock. Curr Opin Anaesthesiol. (2021) 34:71–6. doi: 10.1097/ACO.0000000000000958, PMID: - DOI - PubMed

[7] Liu Z, Meng Z, Li Y, Zhao J, Wu S, Gou S, et al. . Prognostic accuracy of the serum lactate level, the SOFA score and the qSOFA score for mortality among adults with Sepsis. Scand J Trauma Resusc Emerg Med. (2019) 27:51. doi: 10.1186/s13049-019-0609-3, PMID: - DOI - PMC - PubMed

[8] Liu Z, Meng Z, Li Y, Zhao J, Wu S, Gou S, et al. . Prognostic accuracy of the serum lactate level, the SOFA score and the qSOFA score for mortality among adults with Sepsis. Scand J Trauma Resusc Emerg Med. (2019) 27:51. doi: 10.1186/s13049-019-0609-3, PMID: - DOI - PMC - PubMed

[9] Raith EP, Udy AA, Bailey M, McGloughlin S, MacIsaac C, Bellomo R, et al. . Prognostic accuracy of the SOFA score, SIRS criteria, and qSOFA score for in-hospital mortality among adults with suspected infection admitted to the intensive care unit. JAMA. (2017) 317:290–300. doi: 10.1001/jama.2016.20328 - DOI - PubMed

[10] Raith EP, Udy AA, Bailey M, McGloughlin S, MacIsaac C, Bellomo R, et al. . Prognostic accuracy of the SOFA score, SIRS criteria, and qSOFA score for in-hospital mortality among adults with suspected infection admitted to the intensive care unit. JAMA. (2017) 317:290–300. doi: 10.1001/jama.2016.20328 - DOI - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Comparison between traditional logistic regression and machine learning for predicting mortality in adult sepsis patients

Affiliation

Comparison between traditional logistic regression and machine learning for predicting mortality in adult sepsis patients

Authors

Affiliation

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

LinkOut - more resources

Full Text Sources

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Related information

LinkOut - more resources

Full Text Sources