Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Sep 26;15(1):8261.
doi: 10.1038/s41467-024-52598-7.

Decoding the diagnostic and therapeutic potential of microbiota using pan-body pan-disease microbiomics

Affiliations

Decoding the diagnostic and therapeutic potential of microbiota using pan-body pan-disease microbiomics

Georges P Schmartz et al. Nat Commun. .

Abstract

The human microbiome emerges as a promising reservoir for diagnostic markers and therapeutics. Since host-associated microbiomes at various body sites differ and diseases do not occur in isolation, a comprehensive analysis strategy highlighting the full potential of microbiomes should include diverse specimen types and various diseases. To ensure robust data quality and comparability across specimen types and diseases, we employ standardized protocols to generate sequencing data from 1931 prospectively collected specimens, including from saliva, plaque, skin, throat, eye, and stool, with an average sequencing depth of 5.3 gigabases. Collected from 515 patients, these samples yield an average of 3.7 metagenomes per patient. Our results suggest significant microbial variations across diseases and specimen types, including unexpected anatomical sites. We identify 583 unexplored species-level genome bins (SGBs) of which 189 are significantly disease-associated. Of note, the existence of microbial resistance genes in one specimen was indicative of the same resistance genes in other specimens of the same patient. Annotated and previously undescribed SGBs collectively harbor 28,315 potential biosynthetic gene clusters (BGCs), with 1050 significant correlations to diseases. Our combinatorial approach identifies distinct SGBs and BGCs, emphasizing the value of pan-body pan-disease microbiomics as a source for diagnostic and therapeutic strategies.

PubMed Disclaimer

Conflict of interest statement

G.P.S., R.M., and A.K. are co-founders of MooH GmbH, a company developing metagenomic based oral health tests. FM is supported by Deutsche Gesellschaft für Kardiologie (DGK), Deutsche Forschungsgemeinschaft (SFB TRR219, Project-ID 322900939), and Deutsche Herzstiftung. His institution (Saarland University) has received scientific support from Ablative Solutions, Medtronic, and ReCor Medical. He has received speaker honoraria/consulting fees from Ablative Solutions, Amgen, Astra-Zeneca, Bayer, Boehringer Ingelheim, Inari, Medtronic, Merck, ReCor Medical, Servier, and Terumo. The remaining authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Study set up, metagenomics data and clinical information.
a Schematic Workflow describing the sample (upper arrow) and data flow (lower arrow) between clinicians, microbiology, and data science. The clinical data were kept separated from the measurement of microbiomes and only combined after measurement in the computational analysis. b Clinical sampling was focused on seven biospecimens (left blue part). We included patients from a wide range of clinical diseases that allows us analyzing the diagnostic potential of different specimen types across diseases. Created with BioRender.com released under a Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International license. c Sankey plot for the number of samples included in the study at different intervals of the data generation process in relation to our quality control strategy. Specimen types are ordered vertically at each step in the pipeline by frequency of the respective specimen. d Number of reads for each sample colored by specimen. The horizontal line represents the 5 gigabase threshold at a paired-end read length of 150 bp. e Pruned upset plot displaying the most frequent co-occurrence of diseases within the dataset. The combinations are ordered with decreasing frequency, marking the combination of Hypertension and obesity as most common comorbidity in our study. f Ontology used throughout the study grouping diseases by biological systems and separating healthy control from diseased patients. Areas are proportional to the number of patients falling into each category. Patients may be represented multiple times if multiple diseases are diagnosed.
Fig. 2
Fig. 2. Compositional analysis, and link of microbiota to diseases.
a Two-dimensional Uniform Manifold Approximation and Projection (UMAP) embedding of pairwise computed mash distances, colored by biospecimen of the sample. b Alpha-diversity of all samples, colored by specimen. As a measure of species richness, we selected the Shannon diversity. c Relative genus abundance for each cohort of the second ontology level, divided by biospecimen. Only labels for the 20 most abundant genera are displayed. d Sorted log-fold changes of differentially abundant species matching the visualized results of the next panel. Each panel is split vertically separating positive and negative log-fold changes. eg Number of differentially abundant species after p-value adjustment of ANCOMBC results revealed during analysis across all cohorts and specimen combinations (q-val <0.05). Numbers in the circles represent the number of specimens included in the respective analysis. h Center-log ratio (CLR) normalized abundance counts of selected species-cohort-specimen combinations. The visualized diseased cohort is indicated by the text above each panel, whereas the selected biospecimen is indicated by the color of the writing. The first row of panels displays potential pathogen candidates with the highest statistical significance and a pathogen score of one. The second row of panels displays saliva samples of commensal bacteria candidates with a commensal score larger than eighteen (min(n) = 50). Boxplot follows Tukey’s style indicating the median as well as the second and third quantiles within boxes. Whiskers extend up to 1.5 times the interquartile range in the presence of outliers.
Fig. 3
Fig. 3. Assembly and resistance gene analysis.
a Distribution of the number of scaffolds in each sample at various length limits, colored by specimen as box-whisker plot (n = 1931). The boxplot follows a similar style to Fig. 2h. b Sequence of pie charts indicating the presence of emerging antimicrobial resistance genes. Panels are subdivided by genus that was assigned to the contig where resistance genes have been detected. Pie charts scale with the number of measurements in different samples and are colored by the relative frequency of the sample’s biospecimen. c Network visualization of counts of shared antimicrobial resistance (AMR) genes among different biospecimen samples derived from the same patient. Note, any resistance gene annotated by AMRFinderPlus was used for this plot. d Dereplicated SGBs defined from our data. Visualized information includes biospecimen of the initial sample where the SGB was derived from, selected resistance information taken from Pathofact, and effect size of differential coverage analysis for selected cohorts. Note, the visualized differential coverage focuses only on the biospecimen of the initial sample where the SGB has been defined from that is also visualized in the central ring.
Fig. 4
Fig. 4. Evidence-supported genome mining and disease association.
a Schematic representation of our proposed BGC prioritization strategy representing an adapted version of the BiGMAP workflow. Metagenomic assembly is performed for each sample, followed by BGC prediction. Next, all samples are aligned against all core biosynthetic genes of predicted BGCs. Coverage information is extracted, and downstream analysis is performed. b Volcano plot of the differential BGC coverage analysis results. In this visualization, only matching biospecimen – initial BGC contig combinations are visualized, constituting only a fraction of all results. The unadjusted two-tailed unpaired Wilcoxon test p-values are shown with two horizontal lines representing the 0.05 threshold, both before and after p-value adjustment. c Predicted host species distribution of the assembled DNA fragments where significantly associated core biosynthetic genes reside. Color reflects the number of significant BGCs. d Comparison of the highest correlating effect sizes, comparing differential BGC coverage results between alternative diets and diseases. The effect size of the vegetarian-omnivore comparison is visualized on the y-axis. On the x-axis, the cohort named above the panel is compared against the healthy cohort. For the fourth panel, the minimum effect size across all cohort comparisons is taken for each BGC and compared against the diet comparison.

References

    1. Potrykus, M., Czaja-Stolc, S., Stankiewicz, M., Kaska, L. & Malgorzewicz, S. Intestinal Microbiota as a Contributor to Chronic Inflammation and Its Potential Modifications. Nutrients13, 10.3390/nu13113839 (2021). - PMC - PubMed
    1. Kahrstrom, C. T., Pariente, N. & Weiss, U. Intestinal microbiota in health and disease. Nature535, 47 (2016). - PubMed
    1. Becker, A. et al. Effects of resistant starch on symptoms, fecal markers, and gut microbiota in parkinson’s disease - The RESISTA-PD Trial. Genomics Proteom. Bioinforma.20, 274–287 (2022). - PMC - PubMed
    1. Puschhof, J. & Elinav, E. Human microbiome research: growing pains and future promises. PLoS Biol.21, e3002053 (2023). - PMC - PubMed
    1. Katsanos, A. H. et al. in Biomarkers for Endometriosis: State of the Art (ed Thomas D’Hooghe) 41-75 (Springer International Publishing, 2017).

Publication types

Associated data

LinkOut - more resources