Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Aug 22:14:1238199.
doi: 10.3389/fmicb.2023.1238199. eCollection 2023.

MV-CVIB: a microbiome-based multi-view convolutional variational information bottleneck for predicting metastatic colorectal cancer

Affiliations

MV-CVIB: a microbiome-based multi-view convolutional variational information bottleneck for predicting metastatic colorectal cancer

Zhen Cui et al. Front Microbiol. .

Abstract

Introduction: Imbalances in gut microbes have been implied in many human diseases, including colorectal cancer (CRC), inflammatory bowel disease, type 2 diabetes, obesity, autism, and Alzheimer's disease. Compared with other human diseases, CRC is a gastrointestinal malignancy with high mortality and a high probability of metastasis. However, current studies mainly focus on the prediction of colorectal cancer while neglecting the more serious malignancy of metastatic colorectal cancer (mCRC). In addition, high dimensionality and small samples lead to the complexity of gut microbial data, which increases the difficulty of traditional machine learning models.

Methods: To address these challenges, we collected and processed 16S rRNA data and calculated abundance data from patients with non-metastatic colorectal cancer (non-mCRC) and mCRC. Different from the traditional health-disease classification strategy, we adopted a novel disease-disease classification strategy and proposed a microbiome-based multi-view convolutional variational information bottleneck (MV-CVIB).

Results: The experimental results show that MV-CVIB can effectively predict mCRC. This model can achieve AUC values above 0.9 compared to other state-of-the-art models. Not only that, MV-CVIB also achieved satisfactory predictive performance on multiple published CRC gut microbiome datasets.

Discussion: Finally, multiple gut microbiota analyses were used to elucidate communities and differences between mCRC and non-mCRC, and the metastatic properties of CRC were assessed by patient age and microbiota expression.

Keywords: information bottleneck; metastatic colorectal cancer; microbiome; multi-view; risk assessment.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

Figure 1
Figure 1
(A) Traditional classification strategy for microbiome-based disease prediction is health to disease. (B) Our strategy for microbiome-based disease prediction is disease to disease, focusing on predicting malignancy from disease, such as the prediction from non-metastatic colorectal cancer (non-mCRC) to metastatic colorectal cancer (mCRC).
Figure 2
Figure 2
Each sample is isolated differently from the gut.
Figure 3
Figure 3
Work flowchart. Pre-processing: We collected the sequence information of the samples from NCBI and obtained the species abundance data through a series of quality control and filtering methods, and at the same time, we calculated the nearest neighbor information of the samples. Variance Analysis: We map abundance data into a two-dimensional space through dimensionality reduction. (A) PCA is used for dimensionality reduction and visualization. (B) PCoA is used for dimensionality reduction and visualization. (C) T-SNE is used for dimensionality reduction and visualization. (D) NMDS is used for dimensionality reduction and visualization. Community Analysis: (A) We used the species accumulation curve (SAC) to describe the real situation of the disease samples. (B) We used volcano plots to visualize upregulated and downregulated points. MV-CVIB: Flowchart of the proposed method. The method specifically includes three fully connected layers, a two-dimensional convolutional layer, a maximum pooling layer, a PoE module, a decoding module, and an output layer.
Figure 4
Figure 4
Comparison of AUC values of our method with state-of-the-art methods on the mCRC dataset. The methods with asterisks are all from the DeepMicro model, and these methods are a combination of methods in DeepMicro. The best performing AUC is indicated in black bold.
Figure 5
Figure 5
Comparison of AUC values of our method with state-of-the-art methods on three CRC datasets. The methods with asterisks are all from the DeepMicro model, and these methods are a combination of methods in DeepMicro. On the Colorectal dataset, we follow the partial AUC results from DeepMicro. Specifically including AE + SVM*, VAE + SVM*, and CAE + RF*, the results of other combinations are derived from this study. Since DeepMicro did not use Colorectal-EMBL and Early-Colorectal-EMBL datasets, the results on these two datasets are also derived from this study. The best performing AUC is indicated in red font. (A) The AUC value of each method on the Colorectal dataset. (B) The AUC value of each method on the Colorectal-EMBL dataset. (C) The AUC value of each method on the Early-Colorectal-EMBL dataset.
Figure 6
Figure 6
AUC comparison of different combinations of the proposed method on different datasets.
Figure 7
Figure 7
(A) Microbiota (phylum level) stacked bar graphs of the non-mCRC group and mCRC group. (B) To better describe the microbial richness and uniformity of the intestinal tract, we used the alpha diversity index to measure the intestinal ecosystem from different perspectives. (C) To further mine the differences between non-mCRC and mCRC samples, we used STAMP to output significantly different OTUs within the 95% confidence interval. (D) LDA effect size (LEfSe) analysis was used to discover and interpret biomarkers that were statistically different between non-mCRC and mCRC patients. (E) Histogram of the distribution of LDA values.
Figure 8
Figure 8
(A) According to the risk value, the high risk group and the low risk group are divided. (B) Scatter plot of the relationship between patient age and risk status. (C) The heat map of the abundance expression of the bacterial group (order level).

Similar articles

References

    1. Akay A., Hess H. (2019). Deep learning: current and emerging applications in medicine and technology. IEEE J. Biomed. Health Inf. 23, 906–920. 10.1109/JBHI.2019.2894713 - DOI - PubMed
    1. Alemi A. A., Fischer I., Dillon J. V., Murphy K. (2016). Deep variational information bottleneck. arXiv. [preprint]. 10.48550/arXiv.1612.00410 - DOI
    1. Asgari E., Garakani K., McHardy A. C., Mofrad M. R. K. (2018). MicroPheno: predicting environments and host phenotypes from 16S rRNA gene sequencing using a k-mer based representation of shallow sub-samples. Bioinformatics 34, i32–i42. 10.1093/bioinformatics/bty296 - DOI - PMC - PubMed
    1. Buttigieg P. L., Ramette A. A. (2014). guide to statistical analysis in microbial ecology: a community-focused, living review of multivariate data analyses. FEMS Microbiol. Ecol. 90, 543–550. 10.1111/1574-6941.12437 - DOI - PubMed
    1. Cammarota G., Ianiro G., Ahern A., Carbone C., Temko A., Claesson M. J., et al. (2020). Gut microbiome, big data and machine learning to promote precision medicine for cancer. Nat. Rev. Gastroenterol. Hepatol. 17, 635–648. 10.1038/s41575-020-0327-3 - DOI - PubMed

LinkOut - more resources