Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2024 Feb 29:2024.02.27.582333.
doi: 10.1101/2024.02.27.582333.

Meta-analysis of the human gut microbiome uncovers shared and distinct microbial signatures between diseases

Affiliations

Meta-analysis of the human gut microbiome uncovers shared and distinct microbial signatures between diseases

Dong-Min Jin et al. bioRxiv. .

Update in

Abstract

Microbiome studies have revealed gut microbiota's potential impact on complex diseases. However, many studies often focus on one disease per cohort. We developed a meta-analysis workflow for gut microbiome profiles and analyzed shotgun metagenomic data covering 11 diseases. Using interpretable machine learning and differential abundance analysis, our findings reinforce the generalization of binary classifiers for Crohn's disease (CD) and colorectal cancer (CRC) to hold-out cohorts and highlight the key microbes driving these classifications. We identified high microbial similarity in disease pairs like CD vs ulcerative colitis (UC), CD vs CRC, Parkinson's disease vs type 2 diabetes (T2D), and schizophrenia vs T2D. We also found strong inverse correlations in Alzheimer's disease vs CD and UC. These findings detected by our pipeline provide valuable insights into these diseases.

Keywords: complex human diseases; disease similarity; meta-analysis; microbiome.

PubMed Disclaimer

Conflict of interest statement

DISCLOSURE STATEMENT The authors declare that there is no conflict of interest.

Figures

Figure 1.
Figure 1.
The overall design and data analysis pipeline. Flow Chart of this meta-analysis. First, shotgun metagenomic datasets investigating the human gut microbiome in multiple diseases were curated and processed consistently with the Snakemake metagenomic pipeline built in this study, and the microbial abundances matrices were generated. Second, Gradient Boosting and Random Forest classifiers were built for each dataset/disease. Then datasets with classification accuracy above the threshold of 0.6 remained in the following analysis. Disease specific microbial signatures and microbial similarity at the species level were analyzed with the differential abundance results. At the gene level, Pearson correlation coefficients of microbial genes between every disease pair were calculated and used as the proxy for disease similarity. Disease pairs that showed high or low similarity were further investigated with pathway analysis.
Figure 2.
Figure 2.
Interpretation of binary classifiers and differentially abundant microbes overlaps in Crohn’s Disease (CD) and Colorectal Cancer (CRC). The differential abundant microbes are identified first by computing LFC between case and control within one disease, then ranked by the 5% Confidence Interval (CI) of LFC to identify the top-100 case-associated microbes, finally ranked by the 95% CI of LFC to identify the top-100 control-associated microbes. Each dot represents one microbe, and its color is coded by its ranking. Dots colored blue and salmon represent the microbes differentially abundant in disease cases and controls respectively. Dots colored gray are the ones that are considered neutral. (a)-(b) X axis is the shapley values, Y axis is the log2 fold change (LFC) between case and control. Left panels are the cases, which have a sum of shapley values as negative values, right panels are the controls, which have a sum of shapley values as positive values. Dots with high absolute Shapley values and high log2FC are labeled. (a) Shapley values vs LFC in CD cases and controls. (b) Shapley values vs LFC in CRC cases and controls. (c) Overlap of the differentially abundant microbes between CD and CRC. X axis is the microbes, y axis is the microbe’s rankings. A smaller ranking number for case-associated microbes indicates a greater increase of the microbe in disease cases. A smaller ranking number for control-associated microbes indicates a greater increase of the microbe in healthy controls.
Figure 3.
Figure 3.
Microbial species level similarity between diseases. (a) The annotation numbers represent the number of microbes overlapping between two diseases among the top 100 case-associated microbes or the top 100 control-associated microbes. (b)-(d) Dots colored in salmon represent case-associated microbes and their rankings. A smaller ranking number indicates a greater increase of the microbe in disease cases. Dots colored in blue represent control-associated microbes and their rankings in controls. A smaller ranking number indicates a greater increase of the microbe in healthy controls. (b) Overlap of the differentially abundant microbes between Crohn’s Disease (CD) and Ulcerative Colitis (UC). (c) Overlap of the differentially abundant microbes between Schizophrenia and Type 2 Diabetes (T2D). (d) Overlap of the differentially abundant microbes between Parkinson’s Disease (PD) and T2D. (e) (f) Differential abundant microbes shared by more than two diseases. (e) Case-associated microbes shared by more than two diseases. (f) Control-associated microbes shared by more than two diseases.
Figure 4.
Figure 4.
Microbial gene level similarity between diseases and the pathway signatures of the microbes. (a) Pearson correlation coefficient R between the inferred microbial gene log2 fold changes across every two diseases. (b) Scatterplot of the Pearson R between Crohn’s Disease (CD) and Ulcerative Colitis (UC). (c) Scatterplot of the Pearson R between CD and Alzheimer’s Disease (AD). (d) Scatterplot of the Pearson R between UC and AD. (e) Amino acid metabolism, energy metabolism, and lipid metabolism pathways of the microbial signatures in AD, CD, and UC. The x axis are the differential abundant microbes, the blue ones represent control-associated microbes, while the salmon ones represent the case-associated. The y axis are the KEGG pathway modules. The numbers on the right green bar represent the number of genes.
Figure 5.
Figure 5.
Combined similarity networks with sum of overlapped microbe weights. Each node represents one disease type, the weight of edges shows how similar two diseases are. The number in each edge is proportional to the overlapped differential abundant microbes in each disease (case vs control): top-100 (case-associated) and bottom-100 (control-associated). The colors of the edges indicate the origin of the similarities: salmon color edges represent the similarity conferred by the overlap of case-associated microbes; the blue color represents the similarity conferred by the overlap of control-associated microbes.

References

    1. Rooks MG, Garrett WS. 2016. Gut microbiota, metabolites and host immunity. Nat Rev Immunol 16:341–352. - PMC - PubMed
    1. Zheng D, Liwinski T, Elinav E. 2020. Interaction between microbiota and immunity in health and disease. Cell Res 30:492–506. - PMC - PubMed
    1. Allaband C, McDonald D, Vázquez-Baeza Y, Minich JJ, Tripathi A, Brenner DA, Loomba R, Smarr L, Sandborn WJ, Schnabl B, Dorrestein P, Zarrinpar A, Knight R. 2019. Microbiome 101: Studying, Analyzing, and Interpreting Gut Microbiome Data for Clinicians. Clin Gastroenterol Hepatol 17:218–230. - PMC - PubMed
    1. Rayman G, Akpan A, Cowie M, Evans R, Patel M, Posporelis S, Walsh K. 2022. Managing patients with comorbidities: future models of care. Future Healthc J 9:101–105. - PMC - PubMed
    1. Boersma P. 2020. Prevalence of Multiple Chronic Conditions Among US Adults, 2018. Prev Chronic Dis 17. - PMC - PubMed

Publication types