Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jan 6:11:1496869.
doi: 10.3389/fmed.2024.1496869. eCollection 2024.

Comparison between traditional logistic regression and machine learning for predicting mortality in adult sepsis patients

Affiliations

Comparison between traditional logistic regression and machine learning for predicting mortality in adult sepsis patients

Hongsheng Wu et al. Front Med (Lausanne). .

Abstract

Background: Sepsis is a life-threatening disease associated with a high mortality rate, emphasizing the need for the exploration of novel models to predict the prognosis of this patient population. This study compared the performance of traditional logistic regression and machine learning models in predicting adult sepsis mortality.

Objective: To develop an optimum model for predicting the mortality of adult sepsis patients based on comparing traditional logistic regression and machine learning methodology.

Methods: Retrospective analysis was conducted on 606 adult sepsis inpatients at our medical center between January 2020 and December 2022, who were randomly divided into training and validation sets in a 7:3 ratio. Traditional logistic regression and machine learning methods were employed to assess the predictive ability of mortality in adult sepsis. Univariate analysis identified independent risk factors for the logistic regression model, while Least Absolute Shrinkage and Selection Operator (LASSO) regression facilitated variable shrinkage and selection for the machine learning model. Among various machine learning models, which included Bagged Tree, Boost Tree, Decision Tree, LightGBM, Naïve Bayes, Nearest Neighbors, Support Vector Machine (SVM), and Random Forest (RF), the one with the maximum area under the curve (AUC) was chosen for model construction. Model validation and comparison with the Sequential Organ Failure Assessment (SOFA) and the Acute Physiology and Chronic Health Evaluation (APACHE) scores were performed using receiver operating characteristic (ROC) curves, calibration curves, and decision curve analysis (DCA) curves in the validation set.

Results: Univariate analysis was employed to assess 17 variables, namely gender, history of coronary heart disease (CHD), systolic pressure, white blood cell (WBC), neutrophil count (NEUT), lymphocyte count (LYMP), lactic acid, neutrophil-to-lymphocyte ratio (NLR), red blood cell distribution width (RDW), interleukin-6 (IL-6), prothrombin time (PT), international normalized ratio (INR), fibrinogen (FBI), D-dimer, aspartate aminotransferase (AST), total bilirubin (Tbil), and lung infection. Significant differences (p < 0.05) between the survival and non-survival groups were observed for these variables. Utilizing stepwise regression with the "backward" method, independent risk factors, including systolic pressure, lactic acid, NLR, RDW, IL-6, PT, and Tbil, were identified. These factors were then incorporated into a logistic regression model, chosen based on the minimum Akaike Information Criterion (AIC) value (98.65). Machine learning techniques were also applied, and the RF model, demonstrating the maximum Area Under the Curve (AUC) of 0.999, was selected. LASSO regression, employing the lambda.1SE criteria, identified systolic pressure, lactic acid, NEUT, RDW, IL6, INR, and Tbil as variables for constructing the RF model, validated through ten-fold cross-validation. For model validation and comparison with traditional logistic models, SOFA, and APACHE scoring.

Conclusion: Based on deep machine learning principles, the RF model demonstrates advantages over traditional logistic regression models in predicting adult sepsis prognosis. The RF model holds significant potential for clinical surveillance and interventions to enhance outcomes for sepsis patients.

Keywords: adult sepsis; logistic regression; machine learning; mortality; random forest.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

Figure 1
Figure 1
Flowchart illustrating the research design.
Figure 2
Figure 2
Variable shrinkage and selection by LASSO regression. (A) Shrinkage pathway of LASSO regression. (B) Based on ten-fold cross-validation, seven variables, including systolic pressure, lactic acid, NEUT, RDW, IL6, INR, and Tbil, were chosen using the lambda.1SE criteria.
Figure 3
Figure 3
Error rate chart of RF model. As the iteration reached 141 decision trees, the error rates of both out-of-bag (OOB) and model classification showed a noticeable decrease, eventually reaching a steady state.
Figure 4
Figure 4
Comparison of discriminative ability among RF, logistic regression, SOFA, and APACHE scoring system. (A) Training set; (B) validation set. The blue solid ROC curves with the largest AUC values both in training set and validation set represented that RF associated with the best discrimination among the four models. AUC, area under curve; SOFA, sequential organ failure assessment scoring; APACHE, acute physiology and chronic health evaluation scoring.
Figure 5
Figure 5
Comparison of calibration curves among RF, logistic regression, SOFA, and APACHE scoring system. (A) Training set; (B) validation set. The blue solid calibration curves which were notably closer to the ideal reference line both in training set and validation set represented that RF associated with the best goodness-of-fit and accuracy of prediction among the four models. SOFA, sequential organ failure assessment scoring; APACHE, acute physiology and chronic health evaluation scoring. The left x-axis represents the observed probability; the right x-axis represents the sample size, y-axis represents the predicted probability.
Figure 6
Figure 6
Comparison of decision curve analysis among RF, logistic regression, SOFA, and APACHE scoring system. (A) Training set; (B) validation set. With the highest value of AUDC and net benefit both in training set and validation set, RF was considered as the optimum model which associated with the best clinical practicality. SOFA, sequential organ failure assessment scoring; APACHE, acute physiology and chronic evaluation scoring. AUDC, area under DCA curve.

Similar articles

Cited by

References

    1. Singer M, Deutschman CS, Seymour CW, Shankar-Hari M, Annane D, Bauer M, et al. . The third international consensus definitions for Sepsis and septic shock (Sepsis-3). JAMA. (2016) 315:801–10. doi: 10.1001/jama.2016.0287, PMID: - DOI - PMC - PubMed
    1. Esposito S, De Simone G, Boccia G, De Caro F, Pagliano P. Sepsis and septic shock: new definitions, new diagnostic and therapeutic approaches. J Glob Antimicrob Resist. (2017) 10:204–12. doi: 10.1016/j.jgar.2017.06.013, PMID: - DOI - PubMed
    1. Chiu C, Legrand M. Epidemiology of sepsis and septic shock. Curr Opin Anaesthesiol. (2021) 34:71–6. doi: 10.1097/ACO.0000000000000958, PMID: - DOI - PubMed
    1. Liu Z, Meng Z, Li Y, Zhao J, Wu S, Gou S, et al. . Prognostic accuracy of the serum lactate level, the SOFA score and the qSOFA score for mortality among adults with Sepsis. Scand J Trauma Resusc Emerg Med. (2019) 27:51. doi: 10.1186/s13049-019-0609-3, PMID: - DOI - PMC - PubMed
    1. Raith EP, Udy AA, Bailey M, McGloughlin S, MacIsaac C, Bellomo R, et al. . Prognostic accuracy of the SOFA score, SIRS criteria, and qSOFA score for in-hospital mortality among adults with suspected infection admitted to the intensive care unit. JAMA. (2017) 317:290–300. doi: 10.1001/jama.2016.20328 - DOI - PubMed

LinkOut - more resources