Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Jun 8;21(1):138.
doi: 10.1186/s13059-020-02042-y.

Analysis of 1321 Eubacterium rectale genomes from metagenomes uncovers complex phylogeographic population structure and subspecies functional adaptations

Affiliations

Analysis of 1321 Eubacterium rectale genomes from metagenomes uncovers complex phylogeographic population structure and subspecies functional adaptations

Nicolai Karcher et al. Genome Biol. .

Abstract

Background: Eubacterium rectale is one of the most prevalent human gut bacteria, but its diversity and population genetics are not well understood because large-scale whole-genome investigations of this microbe have not been carried out.

Results: Here, we leverage metagenomic assembly followed by a reference-based binning strategy to screen over 6500 gut metagenomes spanning geography and lifestyle and reconstruct over 1300 E. rectale high-quality genomes from metagenomes. We extend previous results of biogeographic stratification, identifying a new subspecies predominantly found in African individuals and showing that closely related non-human primates do not harbor E. rectale. Comparison of pairwise genetic and geographic distances between subspecies suggests that isolation by distance and co-dispersal with human populations might have contributed to shaping the contemporary population structure of E. rectale. We confirm that a relatively recently diverged E. rectale subspecies specific to Europe consistently lacks motility operons and that it is immotile in vitro, probably due to ancestral genetic loss. The same subspecies exhibits expansion of its carbohydrate metabolism gene repertoire including the acquisition of a genomic island strongly enriched in glycosyltransferase genes involved in exopolysaccharide synthesis.

Conclusions: Our study provides new insights into the population structure and ecology of E. rectale and shows that shotgun metagenomes can enable population genomics studies of microbiota members at a resolution and scale previously attainable only by extensive isolate sequencing.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
Reconstruction of 1321 high-quality (HQ) E. rectale genomes from 6775 fecal metagenomes. a The parameters for the binning step of our reference-based workflow (average identity and fraction of contig aligned) were chosen using E. rectale-free metagenomic assemblies spiked with E. rectale sequences obtained from isolate genomes (“Materials and methods”). We report the median number of false positive (FP) bases (binned contigs not coming from spike-in) and false negative (FN) bases (contigs coming from spike-in that were not binned). FP and FN values are scaled with respect to the average E. rectale isolate genome size. The red square indicates the parameter value combination used in this study. b Estimation of completeness and contamination for all extracted genomes using CheckM [17]. c Comparison of genome characteristics for E. rectale isolate genomes, genomes from metagenomes reconstructed with a semi-supervised approach (MCR), and the large set of automatically reconstructed genomes (HQ). d, e Completeness and contamination estimates for bins extracted using the reference-based binning approach used in this study and bins produced by a reference-independent pipeline using metaBAT2 [2, 18]. Only genomes with > 90% completeness and < 5% contamination in both approaches are shown. f The sizes of the E. rectale genomes reconstructed with the reference-based pipeline are very consistent with the genome sizes (gray area) from cultured isolate sequencing (gray shading) while the reference-independent pipeline produces genomes of smaller size. g Pan-genome characteristics for seven E. rectale isolate genomes from NCBI available at the time of processing (Additional file 3: Table S2) as well as seven genomes from the reference-based binning and from Pasolli et al. [2]. For both binning methods, we considered the same seven, randomly selected European metagenomes as well as all seven cultured isolate genomes originating from studies in Europe/North America
Fig. 2
Fig. 2
E. rectale consists of four geographically stratified subspecies. a Maximum likelihood phylogenetic tree of all E. rectale genomes, built from a concatenated core gene alignment using PhyloPhlAn2 (“Materials and methods”) and rooted based on a phylogenetic tree including E. rectale sister species. b Non-metric multidimensional scaling plot of pairwise genetic distances between all E. rectale genomes. c Distribution of intra- and inter-subspecies core gene genetic distances. p values were obtained using bidirectional Wilcoxon rank-sum tests. d Subspecies assignment using PAM clustering with k = 4 (“Materials and methods”). Black points indicate genomes obtained from cultured isolate sequencing
Fig. 3
Fig. 3
Eubacterium rectale subspecies distribution suggests subspecies are isolated by distance. a Relative prevalence of E. rectale subspecies per country (European countries are aggregated). The size of the pie charts is proportional to the total number of genomes obtained per region/country. For a map of Europe, see Additional file 1: Fig. S13. b Pairwise approximated geographic distances between subspecies (considering representative locations) correlate with their median genetic distances (“Materials and methods” for details). A Mantel test between pairwise genetic and geographic distances using the Pearson correlation coefficient yielded a correlation of 0.73 and a p value of 0.041
Fig. 4
Fig. 4
ErEurope is consistently immotile due to loss of motility operons. a No genes from the four motility operons of E. rectale [25] are detected in ErEurope strains, and only a very small fraction of non-ErEurope genomes are lacking some or all of these genes (Additional file 1: Fig. S18). Asterisks denote cultured isolate genomes. b Differentially abundant, non-operon potentially motility-associated KOs between ErEurope and the remaining subspecies. csrA was added despite being present in the flgM/csrA operon because it can be found elsewhere in some E. rectale genomes as well. We annotated genes using eggNOG-mapper [26] and only KOs of the E. rectale reference genome annotated by KEGG [27] are considered. Potentially motility-associated KOs were defined as being part of at least one of the following KEGG pathways: quorum sensing, bacterial chemotaxis, flagellar assembly, and two-component system. p values were calculated using a two-sided Wilcoxon test and corrected for multiple testing at 5% FDR using the Benjamini-Hochberg method. c Core gene sequence and flgB/fliA operon sequence genetic clustering for all motile strains (those belonging to either ErAfrica, ErEurasia or ErAsia). d In vitro motility characterization via phase-contrast microscopy of six E. rectale isolates (“Materials and methods”). Asterisk marks strain L2–21, which is the only immotile ErEurasia strain, presumably as a consequence of the specific lack of the flgB/fliA motility operon we found in its genomes
Fig. 5
Fig. 5
The immotile subspecies ErEurope exhibits a comparatively strong shift in carbohydrate-active enzyme (CAZy) gene repertoire. a ErEurope exhibits higher carbohydrate-active enzyme (CAZy) family counts than the other subspecies. b Density estimates of the number of CAZy genes per 106 nucleotides in the genome for each subspecies. c Non-metric multidimensional scaling plot based on pairwise Manhattan distances between CAZy gene family abundances. d Left: Differentially abundant carbohydrate-active gene families between genomes of ErEurope and ErEurasia. p values were corrected at 5% family-wise error rate using the Bonferroni method. Color-scale is logarithmic. Middle: Effect size and direction of association (difference in mean copy number between ErEurope and ErEurasia). Right: Putative links between catabolic carbohydrate-active enzyme families (CBM, CE, GH) and their substrates. CBM = carbohydrate-binding module, CE = carbohydrate esterase, GH = glycoside hydrolase, GT = glycosyltransferase
Fig. 6
Fig. 6
A newly discovered genomic island enriched for glycosyltransferase genes in ErEurope. a Genome-wide counts of the GT2, GT4, and GT32 families by subspecies. b Annotated open reading frames of the GT-enriched part of a representative example of the genomic island specific to ErEurope. c Comparative genomic analysis of the genomic island (“Materials and methods”). The top five ErEurope strains contain the genomic island, whereas the bottom five do not. Colored segments connecting pairs of genes indicate orthologous genes inferred using progressiveMauve [33]. d GC content along the four contigs from ErEurope strains containing the ErEurope genomic island (“Materials and methods”). YSZC12003_37103 is not shown here because another genomic insertion would misalign the sequences. e Pairwise genetic distances between strains using orthologous genes from the genomic island are lower than those based on core genes. All 56 ErEurope strains with fully extracted genomic island are considered here

References

    1. Pasolli E, Schiffer L, Manghi P, Renson A, Obenchain V, Truong DT, et al. Accessible, curated metagenomic data through ExperimentHub. Nat Methods. 2017;14:1023–1024. - PMC - PubMed
    1. Pasolli E, Asnicar F, Manara S, Zolfo M, Karcher N, Armanini F, et al. Extensive unexplored human microbiome diversity revealed by over 150,000 genomes from metagenomes spanning age, geography, and lifestyle. Cell. 2019;176:1–14. - PMC - PubMed
    1. Duncan SH, Flint HJ. Proposal of a neotype strain (A1-86) for Eubacterium rectale. Request for an opinion. Int J Syst Evol Microbiol. 2008;58:1735–1736. - PubMed
    1. Ríos-Covián D, Ruas-Madiedo P, Margolles A, Gueimonde M, de Los Reyes-Gavilán CG, Salazar N. Intestinal short chain fatty acids and their link with diet and human health. Front Microbiol. 2016;7:185. - PMC - PubMed
    1. Bruzzese E, Callegari ML, Raia V, Viscovo S, Scotto R, Ferrari S, et al. Disrupted intestinal microbiota and intestinal inflammation in children with cystic fibrosis and its restoration with Lactobacillus GG: a randomised clinical trial. PLoS One. 2014;9:e87796. - PMC - PubMed

Publication types

Substances

LinkOut - more resources