Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Meta-Analysis
. 2024 Aug 20;9(8):e0029524.
doi: 10.1128/msystems.00295-24. Epub 2024 Jul 30.

Meta-analysis of the human gut microbiome uncovers shared and distinct microbial signatures between diseases

Affiliations
Meta-Analysis

Meta-analysis of the human gut microbiome uncovers shared and distinct microbial signatures between diseases

Dong-Min Jin et al. mSystems. .

Abstract

Microbiome studies have revealed gut microbiota's potential impact on complex diseases. However, many studies often focus on one disease per cohort. We developed a meta-analysis workflow for gut microbiome profiles and analyzed shotgun metagenomic data covering 11 diseases. Using interpretable machine learning and differential abundance analysis, our findings reinforce the generalization of binary classifiers for Crohn's disease (CD) and colorectal cancer (CRC) to hold-out cohorts and highlight the key microbes driving these classifications. We identified high microbial similarity in disease pairs like CD vs ulcerative colitis (UC), CD vs CRC, Parkinson's disease vs type 2 diabetes (T2D), and schizophrenia vs T2D. We also found strong inverse correlations in Alzheimer's disease vs CD and UC. These findings, detected by our pipeline, provide valuable insights into these diseases.

Importance: Assessing disease similarity is an essential initial step preceding a disease-based approach for drug repositioning. Our study provides a modest first step in underscoring the potential of integrating microbiome insights into the disease similarity assessment. Recent microbiome research has predominantly focused on analyzing individual diseases to understand their unique characteristics, which by design excludes comorbidities in individuals. We analyzed shotgun metagenomic data from existing studies and identified previously unknown similarities between diseases. Our research represents a pioneering effort that utilizes both interpretable machine learning and differential abundance analysis to assess microbial similarity between diseases.

Keywords: complex human diseases; disease similarity; meta-analysis; microbiome.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Fig 1
Fig 1
The overall design and data analysis pipeline. Flow chart of this meta-analysis. First, shotgun metagenomic data sets investigating the human gut microbiome in multiple diseases were curated and processed consistently with the Snakemake metagenomic pipeline built in this study, and the microbial abundances’ matrices were generated. Second, gradient boosting and random forest classifiers were built for each data set/disease. Then, data sets with classification accuracy above the threshold of 0.6 remained in the following analysis. Disease-specific microbial signatures and microbial similarity at the species level were analyzed with the differential abundance results. At the gene level, Pearson’s correlation coefficients of microbial genes between every disease pair were calculated and used as the proxy for disease similarity. Disease pairs that showed high or low similarity were further investigated with pathway analysis.
Fig 2
Fig 2
Interpretation of binary classifiers and differentially abundant microbes’ overlaps in CD and CRC. (a) Shapley values vs log2 fold change (LFC) in CD cases and controls. x axis is the Shapley values, and y axis is the log2 fold change between case and control. Left panels are the cases, which have a sum of Shapley values as negative values, right panels are the controls, which have a sum of Shapley values as positive values. The differentially abundant microbes are identified first by computing LFC between case and control within one disease, then ranked by the 5% confidence interval (CI) of LFC to identify the top 100 case-associated microbes, and finally ranked by the 95% CI of LFC to identify the top 100 control-associated microbes. Each dot represents one microbe, and its color is coded by its ranking. Dots colored blue and salmon represent the microbes differentially abundant in disease cases and controls, respectively. Dots colored gray are the ones that are considered neutral. Dots with high absolute Shapley values and high LFC are labeled. (b) Shapley values vs LFC in CRC cases and controls. Same representation as shown in panel a, but for CRC. (c) Overlap of the differentially abundant microbes between CD and CRC. x axis is the microbes, and y axis is the microbe’s rankings. A smaller ranking number for case-associated microbes indicates a greater increase of the microbe in disease cases. A smaller ranking number for control-associated microbes indicates a greater increase of the microbe in healthy controls.
Fig 3
Fig 3
Microbial species-level similarity between diseases. (a) Overlap of case-associated and control-associated microbes. The annotation numbers represent the number of microbes overlapping between two diseases among the top 100 case-associated microbes or the top 100 control-associated microbes. (b) Overlap of the differentially abundant microbes between Crohn’s disease and ulcerative colitis. Dots colored in salmon represent case-associated microbes and their rankings. A smaller ranking number indicates a greater increase of the microbe in disease cases. Dots colored in blue represent control-associated microbes and their rankings in controls. A smaller ranking number indicates a greater increase of the microbe in healthy controls. (c) Overlap of the differentially abundant microbes between schizophrenia and T2D. (d) Overlap of the differentially abundant microbes between PD and T2D. (e) Case-associated microbes shared by more than two diseases. x axis is the microbes, and y axis is the diseases, colored by the LFC values between case and control within each disease. (f) Control-associated microbes shared by more than two diseases. Same representation as shown in panel e, but for control-associated microbes.
Fig 4
Fig 4
Microbial gene-level similarity between diseases and the pathway signatures of the microbes. (a) Pearson’s correlation coefficient R between the inferred microbial gene log2 fold changes across every two diseases. (b) Scatterplot of the Pearson R between Crohn’s disease and ulcerative colitis. (c) Scatterplot of the Pearson R between CD and AD. (d) Scatterplot of the Pearson R between UC and AD. (e) Amino acid metabolism, energy metabolism, and lipid metabolism pathways of the microbial signatures in AD, CD, and UC. The x axis is the differentially abundant microbes, the blue ones represent control-associated microbes, while the salmon ones represent the case associated. The y axis is the KEGG pathway module. The numbers on the right green bar represent the number of genes.
Fig 5
Fig 5
Combined similarity networks with the sum of overlapped microbe weights. Each node represents one disease type, and the weight of edges shows how similar the two diseases are. The number in each edge is proportional to the overlapped differentially abundant microbes in each disease (case vs control): top 100 (case associated) and bottom 100 (control associated). The colors of the edges indicate the origin of the similarities: salmon color edges represent the similarity conferred by the overlap of case-associated microbes; the blue color represents the similarity conferred by the overlap of control-associated microbes.

Update of

References

    1. Rooks MG, Garrett WS. 2016. Gut microbiota, metabolites and host immunity. Nat Rev Immunol 16:341–352. doi:10.1038/nri.2016.42 - DOI - PMC - PubMed
    1. Zheng D, Liwinski T, Elinav E. 2020. Interaction between microbiota and immunity in health and disease. Cell Res 30:492–506. doi:10.1038/s41422-020-0332-7 - DOI - PMC - PubMed
    1. Allaband C, McDonald D, Vázquez-Baeza Y, Minich JJ, Tripathi A, Brenner DA, Loomba R, Smarr L, Sandborn WJ, Schnabl B, Dorrestein P, Zarrinpar A, Knight R. 2019. Microbiome 101: studying, analyzing, and interpreting gut microbiome data for clinicians. Clin Gastroenterol Hepatol 17:218–230. doi:10.1016/j.cgh.2018.09.017 - DOI - PMC - PubMed
    1. Rayman G, Akpan A, Cowie M, Evans R, Patel M, Posporelis S, Walsh K. 2022. Managing patients with comorbidities: future models of care. Future Healthc J 9:101–105. doi:10.7861/fhj.2022-0029 - DOI - PMC - PubMed
    1. Boersma P, Black LI, Ward BW. 2020. Prevalence of multiple chronic conditions among US adults, 2018. Prev Chronic Dis 17:E106. doi:10.5888/pcd17.200130 - DOI - PMC - PubMed

Publication types

MeSH terms

LinkOut - more resources