Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Dec 17;9(12):e0139524.
doi: 10.1128/msystems.01395-24. Epub 2024 Nov 20.

Deep learning enabled integration of tumor microenvironment microbial profiles and host gene expressions for interpretable survival subtyping in diverse types of cancers

Affiliations

Deep learning enabled integration of tumor microenvironment microbial profiles and host gene expressions for interpretable survival subtyping in diverse types of cancers

Haohong Zhang et al. mSystems. .

Abstract

The tumor microbiome, a complex community of microbes found in tumors, has been found to be linked to cancer development, progression, and treatment outcome. However, it remains a bottleneck in distangling the relationship between the tumor microbiome and host gene expressions in tumor microenvironment, as well as their concert effects on patient survival. In this study, we aimed to decode this complex relationship by developing ASD-cancer (autoencoder-based subtypes detector for cancer), a semi-supervised deep learning framework that could extract survival-related features from tumor microbiome and transcriptome data, and identify patients' survival subtypes. By using tissue samples from The Cancer Genome Atlas database, we identified two statistically distinct survival subtypes across all 20 types of cancer Our framework provided improved risk stratification (e.g., for liver hepatocellular carcinoma, [LIHC], log-rank test, P = 8.12E-6) compared to PCA (e.g., for LIHC, log-rank test, P = 0.87), predicted survival subtypes accurately, and identified biomarkers for survival subtypes. Additionally, we identified potential interactions between microbes and host genes that may play roles in survival. For instance, in LIHC, Arcobacter, Methylocella, and Isoptericola may regulate host survival through interactions with host genes enriched in the HIF-1 signaling pathway, indicating these species as potential therapy targets. Further experiments on validation data sets have also supported these patterns. Collectively, ASD-cancer has enabled accurate survival subtyping and biomarker discovery, which could facilitate personalized treatment for broad-spectrum types of cancers.IMPORTANCEUnraveling the intricate relationship between the tumor microbiome, host gene expressions, and their collective impact on cancer outcomes is paramount for advancing personalized treatment strategies. Our study introduces ASD-cancer, a cutting-edge autoencoder-based subtype detector. ASD-cancer decodes the complexities within the tumor microenvironment, successfully identifying distinct survival subtypes across 20 cancer types. Its superior risk stratification, demonstrated by significant improvements over traditional methods like principal component analysis, holds promise for refining patient prognosis. Accurate survival subtype predictions, biomarker discovery, and insights into microbe-host gene interactions elevate ASD-cancer as a powerful tool for advancing precision medicine. These findings not only contribute to a deeper understanding of the tumor microenvironment but also open avenues for personalized interventions across diverse cancer types, underscoring the transformative potential of ASD-cancer in shaping the future of cancer care.

Keywords: cancer prognosis; deep learning; survival subtype; tumor microbiome.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Fig 1
Fig 1
Material and pipeline of survival subtypes detection. (a). We obtain the 20 cancer data sets from TCGA. Each data set contains paired RNA-seq data and tumor microbiome data. The pie plot near each cancer represents the distribution of tumor stages defined by AJCC Cancer Staging System. The abbreviated names of cancer: ACC, adrenocortical carcinoma; BLCA, bladder urothelial carcinoma; BRCA, breast invasive carcinoma; CESC, cervical squamous cell carcinoma and endocervical adenocarcinoma; COAD, colon adenocarcinoma; LUAD, lung adenocarcinoma; HNSC, head and neck squamous cell carcinoma; KICH, kidney chromophobe; KIRC, kidney renal clear cell carcinoma; KIRP, kidney renal papillary cell carcinoma; LGG, brain lower grade glioma; LIHC, liver hepatocellular carcinoma; LUSC, lung squamous cell carcinoma; PAAD, pancreatic adenocarcinoma; READ, rectum adenocarcinoma; SARC, sarcoma; SKCM, skin cutaneous melanoma; THYM, thymoma; UCEC, uterine corpus endometrial carcinoma; UVM, uveal melanoma. The number of samples followed the abbreviated names. (b). The pipeline of Ensemble deep learning-based survival subtype detection model.
Fig 2
Fig 2
Subtypes detection for the 20 TCGA cancer data sets. (a). Kaplan-Meier plots for each type of cancer. The survival curve for subtypes with better survival outcomes is marked in blue, while the curve for subtypes with worse survival outcomes is marked in orange. The P value is the result of a log-rank test, which is a statistical test used to compare the survival curves of different groups. The CI is concordance index, a measurement of how well a model predicts the ordering of patients’ death times. (b). Heatmap showing the results of replacing the autoencoder with PCA. The values in the heatmap represent the −log10 of the P value from a log-rank test. “NA” indicates that no survival-related features were extracted from at least one of the two omics data sets (RNA-seq and tumor microbiome) for a type of cancer. AE: autoencoder.
Fig 3
Fig 3
Alpha diversity of tumor microbiomes in different survival subtypes and the prediction results of subtype and stage. (a) The area under receiver operating characteristic (AUROC) heatmap of the prediction results of survival subtypes using a leave-one-out method in random forest. Figure (b) shows the accuracy heatmap of the prediction results of tumor stages using a leave-one-out method in random forest, with the numbers after each type of cancer indicating the number of stages. (c) The accuracy heatmap of the prediction results of stages I and IV using a leave-one-out method in random forest, with the ratios after each type of cancer indicating the proportion of stage I samples among stages I and IV samples. In panels a–c, each row represents different features. The first row represents all microbiome features; the second row represents the top 20 most important microbiome features; the third row represents all transcriptome features; the fourth row represents the top 20 most important transcriptome features; the fifth row represents all features from both omics’ approaches; and the sixth row represents the top 20 most important features from both omics’ approaches. (d) The categories of cancers based on the results of omics predictions. The first category consists of cancers with better results obtained from transcriptomics data and the results are consistent with transcriptomics when the two omics are integrated. The second category consists of cancers with better results obtained from microbiome data and the results are consistent with microbiomes when the two omics are integrated. The third category consists of cancers with better results obtained from the integration of the two omics compared to using a single omics. Subtype-stage association: high means significant differences in clinical stage distributions between the two subtypes (chi-squared test, P < 0.05), while low means no significant differences in clinical stage distributions between the two subtypes (chi-squared test, P > 0.05).
Fig 4
Fig 4
GSEA result and correlation network for three representative cancers. (a) The top 10 enriched pathways for BLCA based on gene set enrichment analysis (GSEA) and the correlation network between microbes and host genes (for details see Materials and Methods). (b) The top 10 enriched pathways for LIHC based on GSEA and the correlation network between microbes and host genes (for details see Materials and Methods). (c) The top 10 enriched pathways for LUAD based on GSEA and the correlation network between microbes and host genes (for details see Materials and Methods). (d) The top 10 enriched pathways for CESC based on GSEA and the correlation network between microbes and host genes (for details see Materials and Methods). The depth of the point color represents the P value of enrichment, and the size of the point represents the number of genes enriched in the pathway. The value on the x-axis is the enrichment score, with positive values representing enrichment in subtypes with better survival, and negative values representing enrichment in subtypes with worse survival.
Fig 5
Fig 5
Validation results on two external data sets. (a) Kaplan-Meier plots illustrating the survival subtypes of ASD-1 and ASD-2 in the AC-ICAM Cohort. (b) AUROC plot of the random forest model for predicting survival subtypes in the AC-ICAM cohort using leave-one-out validation. (c) Kaplan-Meier plots illustrating the survival subtypes of ASD-1 and ASD-2 in the Chinese cohort. (d) AUROC plot of the random forest model for predicting survival subtypes in the Chinese cohort using leave-one-out validation. The survival curve for subtypes with favorable survival outcomes is represented in blue, while the curve for subtypes with unfavorable survival outcomes is depicted in orange. The P value corresponds to the result of a log-rank test, a statistical test employed to compare survival curves across different groups. The CI refers to the concordance index, which quantifies the predictive ability of the model in ranking patients' time of death.
Fig 6
Fig 6
Illustration of clinical stages of tumor and survival subtyping. (a) Two groups of cancer identified in our study with schematic heatmap illustrating the association between ASD-1 and ASD-2 with patients' survival outcomes across various clinical stages of tumors. The red dotted diagonal line represents the correlation between clinical stage and patients' survival outcomes under the ideal state. The x-axis denotes the clinical stages of the tumor, while the y-axis represents the survival outcomes. In the first group, there is weak association between clinical stages and survival subtypes, while in the second group, there is strong association between clinical stages and survival subtypes. (b) Schematic representation of the interaction between host genes and tumor microbiomes in the two groups of cancer. In the first group, host genes and tumor microbiomes are weakly correlated, and ASD-1 and ASD-2 usually indistinguishable; while in the second group, host genes and tumor microbiomes are strongly correlated, and ASD-1 and ASD-2 are typically distinguishable.

Comment in

Similar articles

Cited by

References

    1. Marusyk A, Almendro V, Polyak K. 2012. Intra-tumour heterogeneity: a looking glass for cancer? Nat Rev Cancer 12:323–334. doi:10.1038/nrc3261 - DOI - PubMed
    1. Amin MB, Greene FL, Edge SB, Compton CC, Gershenwald JE, Brookland RK, Meyer L, Gress DM, Byrd DR, Winchester DP. 2017. The eighth edition AJCC cancer staging manual: continuing to build a bridge from a population-based to a more "personalized" approach to cancer staging. CA Cancer J Clin 67:93–99. doi:10.3322/caac.21388 - DOI - PubMed
    1. Martinez-Ledesma E, Verhaak RGW, Treviño V. 2015. Identification of a multi-cancer gene expression biomarker for cancer clinical outcomes using a network-based algorithm. Sci Rep 5:11966. doi:10.1038/srep11966 - DOI - PMC - PubMed
    1. Tong M, Zheng W, Li H, Li X, Ao L, Shen Y, Liang Q, Li J, Hong G, Yan H, Cai H, Li M, Guan Q, Guo Z. 2016. Multi-omics landscapes of colorectal cancer subtypes discriminated by an individualized prognostic signature for 5-fluorouracil-based chemotherapy. Oncogenesis 5:e242. doi:10.1038/oncsis.2016.51 - DOI - PMC - PubMed
    1. Hanahan D, Weinberg RA. 2000. The hallmarks of cancer. Cell 100:57–70. doi:10.1016/s0092-8674(00)81683-9 - DOI - PubMed

Substances

LinkOut - more resources