Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Feb 21;4(2):100920.
doi: 10.1016/j.xcrm.2023.100920. Epub 2023 Jan 26.

Enrichment of oral-derived bacteria in inflamed colorectal tumors and distinct associations of Fusobacterium in the mesenchymal subtype

Affiliations

Enrichment of oral-derived bacteria in inflamed colorectal tumors and distinct associations of Fusobacterium in the mesenchymal subtype

Brett S Younginger et al. Cell Rep Med. .

Abstract

While the association between colorectal cancer (CRC) features and Fusobacterium has been extensively studied, less is known of other intratumoral bacteria. Here, we leverage whole transcriptomes from 807 CRC samples to dually characterize tumor gene expression and 74 intratumoral bacteria. Seventeen of these species, including 4 Fusobacterium spp., are classified as orally derived and are enriched among right-sided, microsatellite instability-high (MSI-H), and BRAF-mutant tumors. Across consensus molecular subtypes (CMSs), integration of Fusobacterium animalis (Fa) presence and tumor expression reveals that Fa has the most significant associations in mesenchymal CMS4 tumors despite a lower prevalence than in immune CMS1. Within CMS4, the prevalence of Fa is uniquely associated with collagen- and immune-related pathways. Additional Fa pangenome analysis reveals that stress response genes and the adhesion FadA are commonly expressed intratumorally. Overall, this study identifies oral-derived bacteria as enriched in inflamed tumors, and the associations of bacteria and tumor expression are context and species specific.

Keywords: Fusobacterium; TCGA; cancer; colorectal cancer; gut microbiome; intratumoral bacteria; microbiome; microbiota; oral microbiome; pangenome.

PubMed Disclaimer

Conflict of interest statement

Declaration of interests O.M., J.R., D.R.N., Z.M., and A.L.B. are Genentech/Roche employees. B.S.Y. and M.L.A. were Genentech/Roche employees.

Figures

None
Graphical abstract
Figure 1
Figure 1
74 bacteria, 54 gut and 20 oral species, were detected in the AVANT CRC samples (A) Percentage of prevalence of 74 bacteria in AVANT. Bars colored by phyla. (B) Dark gray bars (left) indicate prevalence across 852 stool samples from healthy and CRC donors,; light gray bars indicate prevalence in 208 saliva samples. Blue dots indicate the prevalence in control samples (stool, n = 568; saliva, n = 184), while red dots indicate CRC (stool, n = 284; saliva, n = 24). (C) Each dot corresponds to 43 species overlapping between Kraken and Pathseq (Table S1H). Kraken prevalences are based on Bracken cutoff of 250 reads plus coverage cutoff of 0.5%. Pathseq prevalences are based on a score exceeding 250 plus coverage cutoff of 0.5%. (D) For species in (A), the mean (excluding zeros) Kraken-assigned reads (prior to any Bracken reassignment) is shown. Point size reflects the percentage of samples that exceeded 250 Kraken-assigned reads for a given species. (E) For 43 species in (C), the mean Pathseq score (excluding zeros) is shown. Point size reflects the percentage of samples that exceeded a score of 250 for a given species. (C–E) Color of the point corresponds to the phyla and the shape to the gut or oral designation. (F) Tile color reflects the log(OR), and shades of red indicate a taxa was more prevalent in BRAF-mutant versus wild type, MSI high (MSI-H) versus MSS, or right versus left, while blue shades indicate the opposite. Fisher’s exact test significance, ∗FDR < 0.05, ∗∗FDR < 0.01, ∗∗∗FDR < 0.001. See also Figures S1–S3 and Tables S1D–S1Q.
Figure 2
Figure 2
The association of Fa and tumor gene expression varies by CMS (A) Bars indication percentage of AVANT samples in each CMS. Values indicate the number of samples. (B) Bars indicate the proportion of samples by location. (C) Bars indicate the proportion of samples by MSI status. (D) Bars indicate the proportion of samples that were BRAF-wild type (WT) and mutant (MUT). (E) For species differentially present between CMSs (chi-squared FDR < 0.05), top heatmap shows the species prevalence by subtype. In the middle, tile color reflects the log(OR) of CMS1 versus the other CMSs. Fisher’s exact test significance, ∗FDR < 0.05, ∗∗FDR < 0.01, ∗∗∗FDR < 0.001. The bottom row indicates the species’ phyla. (F) Volcano plot of differentially abundant genes between samples with or without Fa based on Voom-Limma. CMS was included as a covariate for the results on the right. Blue dots indicate genes with FDR <0.05 and log2(fold change) <0; red dots are genes with FDR <0.05 and log2(fold change) >0. Labels indicate how many genes were statistically significantly up or down. (G) Balloon plot shows 24 genes with FDR <0.05 and log2(fold change) >1 in (F). Circle size corresponds to the −log10(FDR), while color is the log2(fold change). The bottom row represents the differential expression values for all samples with or without Fa. The next row includes CMS as a covariate. The CMS1 column represents the differential expression values for all CMS1 samples versus the CMS2, -3, and -4 samples. Similarly, the CMS2 column represents the CMS2 samples versus the CMS1, -3, -4, etc., samples. (H) Bars indicate percentage of Fa positivity by subtype. Values indicate the number of samples. (I) Volcano plot of differentially abundant genes between samples with or without Fa stratified by CMS. Colors and labels are the same as (F). See also Figures S4 and S5 and Tables S1M, S1N, and S1Q–S1S.
Figure 3
Figure 3
Fa is associated with upregulation of collagen- and immune-related pathways in CMS4 tumors (A) For each pathway, a gray square denotes a gene’s presence. Only genes differentially expressed based on Fa presence in at least one of the CMS strata are shown. (B) For genes differentially expressed in a particular CMS, the log2(fold change) is shown. Genes upregulated in the presence of Fa are shown in red shades, while downregulated genes are in blue. (C) The circle color indicates in which CMS the REACTOME pathway was statistically significantly enriched based on GSEA in a given species, while size indicates the −log10(FDR) value. (D) Volcano plots from Figure 2I with addition of red dots highlighting genes of interest in (B). Labeled genes had log2(fold change) <−1 or >1. (E) Relative abundance plots indicate average cell-type composition in patients with or without Fa in all samples and across the different CMSs. (F) Percentage of neutrophils across all samples and by CMS and Fa presence/absence. Wilcoxon-test FDR values are shown. See also Table S1T.
Figure 4
Figure 4
Fa gene expression in pangenome analysis (A) Gene accumulation curves for pangenome (blue) and core genome (green) as a function of genome sequences (N). Both are fit by a power law regression. Points are means of n for 200 simulations. Error bars indicate the SDs for the 200 simulations. (B) Accumulation of new genes (n) discovered with the addition of new genome sequences (N) fits a power law regression. (C) Boxplots indicate the number of Fa genes identified in the Fa+ and Fa samples. Wilcoxon p value is shown. (D) Density plot shows the distribution of prevalence of the pangenome genes. 1,546 genes were identified in 0 of the Fa+ tumors. 948 were identified in at least 10% of the Fa+ tumors. (E) Bars indicate proportion of genes annotated as ribosomal in the whole pangenome (gray), in the part expressed in at least 10% of the Fa+ samples (light blue), in at least 20% (dark blue), and 50% (green). Fisher’s exact test significance, ∗FDR < 0.05, ∗∗FDR < 0.01, ∗∗∗FDR < 0.001, between the entire pangenome (cutoff 0) and the three prevalence cutoffs (10, 20, and 50). (F) Same as (E) but non-ribosomal KEGG pathways. (G) Same as (E) but annotations to CARD and VFDB. (H) For the 35 non-ribosomal genes expressed in more than half of the Fa+ samples, the far left columns indicate whether the gene was annotated as part of the pangenome core or mapped to something in the CARD or VFDB. Middle column indicates KEGG pathway annotation. On the right, gray dots indicate the prevalence in Fa samples, while blue dots indicate Fa+ prevalence. See also Tables S1U and S1V.

Similar articles

Cited by

References

    1. Siegel R.L., Miller K.D., Fuchs H.E., Jemal A. Cancer statistics, 2021. CA. Cancer J. Clin. 2021;71:7–33. doi: 10.3322/caac.21654. - DOI - PubMed
    1. Siegel R.L., Miller K.D., Goding Sauer A., Fedewa S.A., Butterly L.F., Anderson J.C., Cercek A., Smith R.A., Jemal A. Colorectal cancer statistics, 2020. CA. Cancer J. Clin. 2020;70:145–164. doi: 10.3322/caac.21601. - DOI - PubMed
    1. Guinney J., Dienstmann R., Wang X., de Reyniès A., Schlicker A., Soneson C., Marisa L., Roepman P., Nyamundanda G., Angelino P., et al. The consensus molecular subtypes of colorectal cancer. Nat. Med. 2015;21:1350–1356. doi: 10.1038/nm.3967. - DOI - PMC - PubMed
    1. Lee K.-H., Chen W.-S., Jiang J.-K., Yang S.-H., Wang H.-S., Chang S.-C., Lan Y.-T., Lin C.-C., Lin H.-H., Huang S.-C., et al. The efficacy of anti-EGFR therapy in treating metastatic colorectal cancer differs between the middle/low rectum and the left-sided colon. Br. J. Cancer. 2021;125:816–825. doi: 10.1038/s41416-021-01470-2. - DOI - PMC - PubMed
    1. Thomas A.M., Manghi P., Asnicar F., Pasolli E., Armanini F., Zolfo M., Beghini F., Manara S., Karcher N., Pozzi C., et al. Metagenomic analysis of colorectal cancer datasets identifies cross-cohort microbial diagnostic signatures and a link with choline degradation. Nat. Med. 2019;25:667–678. doi: 10.1038/s41591-019-0405-7. - DOI - PMC - PubMed

Supplementary concepts