Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Mar 25;10(1):53.
doi: 10.1186/s40168-022-01251-w.

Metagenomic strain detection with SameStr: identification of a persisting core gut microbiota transferable by fecal transplantation

Affiliations

Metagenomic strain detection with SameStr: identification of a persisting core gut microbiota transferable by fecal transplantation

Daniel Podlesny et al. Microbiome. .

Abstract

Background: The understanding of how microbiomes assemble, function, and evolve requires metagenomic tools that can resolve microbiota compositions at the strain level. However, the identification and tracking of microbial strains in fecal metagenomes is challenging and available tools variably classify subspecies lineages, which affects their applicability to infer microbial persistence and transfer.

Results: We introduce SameStr, a bioinformatic tool that identifies shared strains in metagenomes by determining single-nucleotide variants (SNV) in species-specific marker genes, which are compared based on a maximum variant profile similarity. We validated SameStr on mock strain populations, available human fecal metagenomes from healthy individuals and newly generated data from recurrent Clostridioides difficile infection (rCDI) patients treated with fecal microbiota transplantation (FMT). SameStr demonstrated enhanced sensitivity to detect shared dominant and subdominant strains in related samples (where strain persistence or transfer would be expected) when compared to other tools, while being robust against false-positive shared strain calls between unrelated samples (where neither strain persistence nor transfer would be expected). We applied SameStr to identify strains that are stably maintained in fecal microbiomes of healthy adults over time (strain persistence) and that successfully engraft in rCDI patients after FMT (strain engraftment). Taxonomy-dependent strain persistence and engraftment frequencies were positively correlated, indicating that a specific core microbiota of intestinal species is adapted to be competitive both in healthy microbiomes and during post-FMT microbiome assembly. We explored other use cases for strain-level microbiota profiling, as a metagenomics quality control measure and to identify individuals based on the persisting core gut microbiota.

Conclusion: SameStr provides for a robust identification of shared strains in metagenomic sequence data with sufficient specificity and sensitivity to examine strain persistence, transfer, and engraftment in human fecal microbiomes. Our findings identify a persisting healthy adult core gut microbiota, which should be further studied to shed light on microbiota contributions to chronic diseases. Video abstract.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
Species-specific shared strain detection in metagenomic samples with SameStr. A Schematic of the SameStr workflow. SameStr has been implemented modularly, including optional wrapper functions for quality preprocessing and alignment of whole-genome shotgun (WGS) metagenomic reads to species-specific MetaPhlAn markers (align), functions for the conversion to nucleotide variant profiles (convert), extraction of markers from genome sequences (extract), sample and reference pooling (merge), extensive global, per-sample, marker and position filtering (filter) and comparison of SNV profiles (compare) based on maximum variant similarity (MVS). SameStr outputs (summarize) tables denoting pairwise comparison results, including species alignment similarity and overlap, and co-occurrence of taxa at distinct taxonomic levels (based on MetaPhlAn) and at the strain level. B SameStr identifies shared strains in metagenomic samples by calculating a pairwise MVS, using all single-nucleotide variants detected in the read alignments of these samples to species-specific marker genes. C To assess the MetaPhlAn-based phylogenetic resolution (db_v20) and validate the 99.9% similarity threshold of shared strains, which is used by SameStr, 458 bacterial genomes from 20 of the most abundant and prevalent fecal microbiota species in our rCDI cohort (Table S4) were compared with MetaPhlAn2 [30] and based on average nucleotide identities (ANIs) as determined with FastANI [31]. MetaPhlAn2 and FastANI-based pairwise sequence similarities are strongly correlated (Spearman’s r = 0.93, p < 2.2e−16, n = 9813), demonstrating comparable phylogenetic resolution. Genome similarities exhibit a multimodal distribution (two-dimensional density kernel contours): reference genomes share peak sequence similarities at 97.5%, 99.0%, and above 99.9% identity that reflect the presence of distinct species, subspecies, and strains in the reference dataset
Fig. 2
Fig. 2
Sensitivity and specificity comparison to other strain prediction tools. A SameStr detects dominant and subdominant strains at low sequencing depth (mean-fold target strain coverage) and relative abundance (i.e., high noise coverage) in simulated metagenomes (n = 3276) of multi-strain species populations, compared to consensus variant profile similarity (CVS)-based methods. B Using MetaPhlAn’s clade-specific marker gene database (db_v20), SameStr identifies more genera and species per metagenomic sample (n = 65) than StrainFinder, which uses mg-OTUs that are defined based on phylogenetic comparisons of universally distributed bacterial genes from the AMPHORA database. C Fewer shared strain calls demonstrate the increased specificity of SameStr compared to StrainFinder, which allows for the differentiation of related (n=555) and unrelated (n=1,525) sample pairs. D Cumulative relative abundance and fraction of species for which strain-level resolution was achieved with SameStr in fecal metagenomes from a reference cohort of 67 longitudinally sampled healthy adults (n = 202). E SameStr’s MVS-based method detects shared strains in a larger fraction of species in related (same individual, n = 281) but not in unrelated (different individuals, n = 20,020) sample pairs of the control cohort (n = 202 individuals) compared to CVS-based methods
Fig. 3.
Fig. 3.
Identification of strain persistence and donor strain engraftment in healthy individuals and rCDI patients after FMT. A Longitudinal species and strain persistence in healthy adults from the reference (Control) cohort are shown as relative abundances of shared species and species fractions in 95 sample pairs from 59 individuals and modeled using binomial smoothing. Strain proportions are based on corresponding species. Species fractions indicate insufficient resolution for strain prediction. B Taxonomic variations in the frequency of species (dark blue), and strain (light blue) persistence in healthy individuals (n = 59) and FMT recipients (n = 19), and of donor species (dark green) and strain (light green) engraftment in post-FMT patients are shown, as summarized on the genus level for the 50 most prevalent genera (see Fig. S3 for species). Newly detected species and strains are shown in dark and light yellow, respectively. C Comparison of shared strain numbers between rCDI patients and donors. Distinct rCDI patients who received stool from the same donor share more strains than other post-FMT patients. D Donor-derived strains and species (exclusively shared with the donor but with insufficient resolution for strain prediction) account for large and stable relative abundances and species fractions in FMT-treated rCDI patients. Data for triads of successfully FMT-treated rCDI patients (n = 30) in reference to their pre-FMT (n = 19) and donor (n = 14) metagenomes are modeled across cases using binomial smoothing. E The frequencies of strain persistence in healthy individuals and of donor strain engraftment in rCDI patients after FMT are positively correlated at the genus level (Spearman’s r = 0.72, p < 1e−8), including for abundant members of the healthy adult fecal microbiota (see Fig. S5 for species-level comparison)
Fig. 4
Fig. 4
Identification of healthy individuals and FMT recipients and donors using shared strain profiles. Receiver-operating characteristic (ROC) and precision-recall (PR) curves of logistic regression classifiers demonstrate sensitive and accurate identification of (A) longitudinally collected sample pairs from the same healthy individuals (n = 112 from a total of n = 8120 sample pairs) and (B) related FMT patient and donor sample pairs (n = 580, including pre- and post-FMT patient samples, post-FMT patient and donor samples, and post-FMT patient samples that received FMT from the same donor, from a total of n = 4186 sample pairs)
Fig. 5
Fig. 5
SameStr-based unsupervised strain sharing networks identify potentially mislabeled samples. Shared strain profiles were visualized as unsupervised networks with individual samples as nodes and shared strain numbers as edges. A These networks connect samples from Louis et al. [38] by individual, with the exception of two samples (AS64_24 and AS66_24) that appear to be mixed up. B In a case of multiple rCDI patients treated with FMT from the same donor [15], shared strains were detected between pre- (blue) and post-FMT (yellow) patient samples, as well as between post-FMT and donor (green) samples and among post-FMT samples. Pre-FMT samples did not share strains with donor samples, with the exception of FMT15, which shares (> 15) strains with all three donor samples and exhibits ɑ/β-diversity compositions that are comparable to other post-FMT samples (data not shown). As this sample was collected on the day of the FMT procedure, FMT15 could in fact represent a post-FMT sample that was accidentally mislabeled as a pre-FMT sample (Smillie, personal communication)

References

    1. Duvallet C, Gibbons SM, Gurry T, Irizarry RA, Alm EJ. Meta-analysis of gut microbiome studies identifies disease-specific and shared responses. Nat Commun. 2017;8:1784. - PMC - PubMed
    1. Sze MA, Schloss PD. Erratum for Sze and Schloss, “Looking for a signal in the noise: revisiting obesity and the microbiome”. MBio. 2017;8. 10.1128/mBio.01995-17. - PMC - PubMed
    1. Wirbel J, Zych K, Essex M, Karcher N, Kartal E, Salazar G, et al. Microbiome meta-analysis and cross-disease comparison enabled by the SIAMCAT machine learning toolbox. Genome Biol. 2021;22:93. - PMC - PubMed
    1. Lloyd-Price J, Mahurkar A, Rahnavard G, Crabtree J, Orvis J, Hall AB, et al. Strains, functions and dynamics in the expanded Human Microbiome Project. Nature. 2017;550:61–66. - PMC - PubMed
    1. Schirmer M, Garner A, Vlamakis H, Xavier RJ. Microbial genes and pathways in inflammatory bowel disease. Nat Rev Microbiol. 2019;17:497–511. - PMC - PubMed

Publication types

LinkOut - more resources