Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Nov 25:13:1040286.
doi: 10.3389/fimmu.2022.1040286. eCollection 2022.

Identification of two robust subclasses of sepsis with both prognostic and therapeutic values based on machine learning analysis

Affiliations

Identification of two robust subclasses of sepsis with both prognostic and therapeutic values based on machine learning analysis

Wei Zhou et al. Front Immunol. .

Abstract

Background: Sepsis is a heterogeneous syndrome with high morbidity and mortality. Optimal and effective classifications are in urgent need and to be developed.

Methods and results: A total of 1,936 patients (sepsis samples, n=1,692; normal samples, n=244) in 7 discovery datasets were included to conduct weighted gene co-expression network analysis (WGCNA) to filter out candidate genes related to sepsis. Then, two subtypes of sepsis were classified in the training sepsis set (n=1,692), the Adaptive and Inflammatory, using K-means clustering analysis on 90 sepsis-related features. We validated these subtypes using 617 samples in 5 independent datasets and the merged 5 sets. Cibersort method revealed the Adaptive subtype was related to high infiltration levels of T cells and natural killer (NK) cells and a better clinical outcome. Immune features were validated by single-cell RNA sequencing (scRNA-seq) analysis. The Inflammatory subtype was associated with high infiltration of macrophages and a disadvantageous prognosis. Based on functional analysis, upregulation of the Toll-like receptor signaling pathway was obtained in Inflammatory subtype and NK cell-mediated cytotoxicity and T cell receptor signaling pathway were upregulated in Adaptive group. To quantify the cluster findings, a scoring system, called, risk score, was established using four datasets (n=980) in the discovery cohorts based on least absolute shrinkage and selection operator (LASSO) and logistic regression and validated in external sets (n=760). Multivariate logistic regression analysis revealed the risk score was an independent predictor of outcomes of sepsis patients (OR [odds ratio], 2.752, 95% confidence interval [CI], 2.234-3.389, P<0.001), when adjusted by age and gender. In addition, the validation sets confirmed the performance (OR, 1.638, 95% CI, 1.309-2.048, P<0.001). Finally, nomograms demonstrated great discriminatory potential than that of risk score, age and gender (training set: AUC=0.682, 95% CI, 0.643-0.719; validation set: AUC=0.624, 95% CI, 0.576-0.664). Decision curve analysis (DCA) demonstrated that the nomograms were clinically useful and had better discriminative performance to recognize patients at high risk than the age, gender and risk score, respectively.

Conclusions: In-depth analysis of a comprehensive landscape of the transcriptome characteristics of sepsis might contribute to personalized treatments and prediction of clinical outcomes.

Keywords: LASSO; clinical outcomes; clustering; logistic regression; sepsis.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

Figure 1
Figure 1
Workflow of the present research.
Figure 2
Figure 2
Candidate genes detection. (A) Clustering dendrogram of the saved 1,778 sepsis samples in WGCNA and clinical features. (B) Heatmap of Pearson correlation analysis of modules and clinical traits. Rows represent modules and columns represent traits. The values ​​in the squares represent correlation degree and p values. Color red represents positive correlation and color blue represents negative correlation. (C) Boxplots of GS among 9 modules. Module blue and module black demonstrated higher values gene significance, than that of the 7 modules, tested by t-test. (D) Scatter plots of Correlation of GS within MM. Genes with |GS |>0.2 and |MM |>0.8 were considered significant. (E) Venn plot of the intersections between DEGs and genes filtered from WGCNA. (F) Heatmap of the candidate genes. The expression values were normalized from -2 to 2. Color red represents relatively increased expression and color blue represents relatively deceased expression.
Figure 3
Figure 3
K-means clustering analysis and cluster annotation. (A) Total within sum of square (WSS) plotted against the number of clusters. The WSS dropped rapidly from 1 to 2 classes and slowly after k = 2. (B) Average silhouette width plotted against the number of clusters, demonstrating the 2-subclass was the ideal choice. (C) Scatter plot of distribution of sepsis samples in the two principal dimensions. (D) Volcano plot of DEGs of cluster B vs. cluster A. (E) Heatmap of DEGs between cluster A and cluster B. The expression values were normalized from -3 to 3. Color red represents relatively increased expression and color blue represents relatively deceased expression. (F) Gene Ontology (GO) analysis on DEGs overexpressed in cluster A. (G) Gene Ontology (GO) analysis on DEGs overexpressed in cluster B.
Figure 4
Figure 4
GSEA and GSVA. (A) GSEA of genesets for cluster A (top) and cluster B (bottom). (B) Heatmap of GSVA on sepsis samples grouped by K-means clusters. (C) Scatterplot of the changed pathway-related signatures.
Figure 5
Figure 5
Immune reprogramming analysis. (A) Complex heatmap of immune cell fractions between cluster A and cluster B. (B) Scatter plot of log2FC values of immune cell markers. Color red represented the genes were relatively overexpressed in cluster B and color blue represented markers were comparatively upregulated in cluster A. (C) Scatter plot of markers expressed in single cell RNA-sequencing samples. (D) Cell annotation analysis identified four types of cells. (E) The distribution of four types of cells between the two clusters. **p < 0.01; ***p < 0.001; ****p < 0.0001; ns, not significant.
Figure 6
Figure 6
Feature selection and association of risk score with sepsis outcomes. (A) The ten-fold cross-validation results. The line on the left indicated the value of the parameter log(λ) for the error-minimized model. 28 variables were filtered out when log(λ) = −4.74. (B) LASSO coefficient profiles of the 28 features. (C) Forest plot of features significant in logistic regression analysis. (D) Violin plot of distribution of risk score between cluster A and cluster B in the training (upper) and validation (lower) sets.
Figure 7
Figure 7
Nomogram establishment and performance assessment. (A) A nomogram established by multivariate logistic regression for predicting the risk of sepsis survival outcomes. (B) ROC curves demonstrated the capability of nomogram, risk score, age and gender in predicting prognosis of sepsis patients. (C) Calibration plot with a binary fringe plot of nomogram in the training set. (D) Decision curve analysis for the sepsis nomogram and age, gender and risk score.
Figure 8
Figure 8
Association of risk score with clinical features and therapeutic response. (A, B) Violin plot of association of risk score with age and gender of sepsis patients. (C) Scatter plot of Pearson correlation analysis of risk score and sequential organ failure assessment (SOFA) score. (D) Box density plot of risk score with clinical therapeutic response. (E) ROC curve of performance of risk score in predicting early supportive therapy. *p < 0.05; ns, not significant.

Similar articles

Cited by

References

    1. Cecconi M, Evans L, Levy M, Rhodes A. Sepsis and septic shock. Lancet (London England) (2018) 392(10141):75–87. doi: 10.1016/S0140-6736(18)30696-2 - DOI - PubMed
    1. Zhang YY, Ning BT. Signaling pathways and intervention therapies in sepsis. Signal Transduct Target Ther (2021) 6(1):407. doi: 10.1038/s41392-021-00471-0 - DOI - PMC - PubMed
    1. Peters-Sengers H, Butler JM, Uhel F, Schultz MJ, Bonten MJ, Cremer OL, et al. . Source-specific host response and outcomes in critically ill patients with sepsis: a prospective cohort study. Intensive Care Med (2022) 48(1):92–102. doi: 10.1007/s00134-021-06574-0 - DOI - PMC - PubMed
    1. Zhang C, Liu H, Xu P, Tan Y, Xu Y, Wang L, et al. . Identification and validation of a five-lncRNA prognostic signature related to glioma using bioinformatics analysis. BMC Cancer (2021) 21(1):251. doi: 10.1186/s12885-021-07972-9 - DOI - PMC - PubMed
    1. Scicluna BP, van Vught LA, Zwinderman AH, Wiewel MA, Davenport EE, Burnham KL, et al. . Classification of patients with sepsis according to blood genomic endotype: a prospective cohort study. Lancet Respir Med (2017) 5(10):816–26. doi: 10.1016/S2213-2600(17)30294-1 - DOI - PubMed