Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2022 Oct 6:2022.03.30.486478.
doi: 10.1101/2022.03.30.486478.

Ultra-deep Sequencing of Hadza Hunter-Gatherers Recovers Vanishing Gut Microbes

Affiliations

Ultra-deep Sequencing of Hadza Hunter-Gatherers Recovers Vanishing Gut Microbes

Bryan D Merrill et al. bioRxiv. .

Update in

Abstract

The gut microbiome is a key modulator of immune and metabolic health. Human microbiome data is biased towards industrialized populations, providing limited understanding of the distinct and diverse non-industrialized microbiomes. Here, we performed ultra-deep metagenomic sequencing and strain cultivation on 351 fecal samples from the Hadza, hunter-gatherers in Tanzania, and comparative populations in Nepal and California. We recover 94,971 total genomes of bacteria, archaea, bacteriophages, and eukaryotes, 43% of which are absent from existing unified datasets. Analysis of in situ growth rates, genetic pN/pS signatures, high-resolution strain tracking, and 124 gut-resident species vanishing in industrialized populations reveals differentiating dynamics of the Hadza gut microbiome. Industrialized gut microbes are enriched in genes associated with oxidative stress, possibly a result of microbiome adaptation to inflammatory processes. This unparalleled view of the Hadza gut microbiome provides a valuable resource that expands our understanding of microbes capable of colonizing the human gut and clarifies the extensive perturbation brought on by the industrialized lifestyle.

PubMed Disclaimer

Conflict of interest statement

Competing interests

The authors declare no competing interests.

Figures

Fig. 1.
Fig. 1.. A vast resource of Hadza gut microbiome data.
(A) Overview of sample collection for shotgun metagenomic sequencing of Hadza fecal samples. (B) Summary of the computational workflow, tools used, and primary data generated from Hadza stool samples. (C) Number of samples versus the number of bases sequenced per sample for 21 previously published human gut metagenomic data sets and the present study.
Fig. 2.
Fig. 2.. The Hadza gut microbiota contains substantial multi-domain novelty.
(A) Phylogenetic tree of bacterial species-level representative genomes (SRGs) from Hadza and UHGG based on bacterial single copy gene alignment; branch colors correspond to phyla. SRGs from species-level groups consisting of only genomes assembled from the Hadza or only UHGG are colored green and orange in the outer ring, respectively. The number of SRGs found in the Hadza, UHGG, or both is shown as a horizontal line. Hadza genomes that are novel at the family or order level according to GTDB are annotated with red and blue stars, respectively. (B) The percentage of bacteriophage species clusters assembled from the Hadza that are novel at the species level according to the MGV ((Nayfach et al., 2021b)), categorized by phylum of the predicted host. Bacteriophages without a host prediction are labeled “Uncharacterized”. (C) A phylogenetic tree of eukaryotic genomes recovered from Hadza and Nepali gut metagenomes based on universal single copy genes. Public reference genomes are marked with blue text labels. The heatmap shows the prevalence of the individual eukaryotes in the Hadza, Nepali and Californian cohorts. (D) For each population, the percentage of predicted proteins from recovered genomes that are present in the UniRef100 and UHGP-95 (Almeida et al., 2020) protein databases. (E) The percentage of metagenomic reads mapping to various domains averaged across all metagenomic samples from each population. The phyla “Bacteriodota” and “Firmicutes_A” are shown separated from other bacteria. “Unmapped” depicts the percentage of reads that do not map to any genomes, and “Low confidence” depicts the percentage of reads that map to genomes with less than 50% genome breadth. (F) The Shannon diversity of bacteria, archaea, bacteriophage, and eukaryote genomes in metagenomes sequenced in this study. P-values from two-sided Mann-Whitney-Wilcoxon test with multiple hypothesis correction; *: p < 0.05, **: p < 0.01, ***: p < 0.001, ****: p < 0.0001, ns: p ≥ 0.05. (G) Collectors’ curves depicting the average number of genomes detected per sample in each population sequenced in this study after rarefaction to various sequencing depths. The vertical dotted lines indicate the average per-sample sequencing depth of this study (~23 Gbp) and the average depth of samples studied previously (~4 Gbp; ref. (Almeida et al., 2020)). Shaded areas around lines indicate 95% confidence intervals. “Nepal For.” includes the Chepang foragers, while “Nepal Ag.” includes Raute, Raji, and Tharu agrarians.
Fig. 3.
Fig. 3.. Increased sequencing depth results in the detection of novel and phylogenetically distinct taxa.
(A) The number of genomes detected in individual samples sequenced in this study when limiting sequencing depth by 5 Gbp increments. Each line represents an individual sample from which ≥ 50 Gbp of trimmed, filtered reads were generated. Lines are colored by population. Vertical dotted lines indicate the average per-sample sequencing depth of this study (23 Gbp) and the average per-sample sequencing depth of samples used in Almleida et al. (4 Gbp (Almeida et al., 2020)). (B) Taxonomic distribution of organisms present at different ranges of relative abundance levels (horizontal stacked bar plots) and the percentage of species that are novel according to GTDB r95 (text percentages right of horizontal bars). Organisms detected at low relative abundance levels are more likely to be novel than those that are more abundant.
Fig. 4.
Fig. 4.. VANISH and BloSSUM taxa have distinct global prevalence, function, growth rates and covariance with eukaryote detection.
(A) A heatmap depicting the presence of 524 SRGs (columns) within metagenomic samples from populations living different lifestyles (rows). Darker blue indicates SRG presence, lighter blue indicates SRG absence. SRGs with >30% prevalence among all samples in any lifestyle category were included. (B) SRGs were classified as “BloSSUM” or “VANISH” based on their prevalence across lifestyles (see methods for details). Colored bars correspond to columns in the heatmap. (C) The prevalence of VANISH (magenta), BloSSUM (blue) and non-enriched taxa (gray) in the Hadza, transitional lifestyle populations and industrial lifestyle populations. Dashed lines connect median prevalence across the taxa in each category surrounded by standard deviation (color shaded regions). Solid lines show the median prevalence for 6 representative taxa in each of these lifestyle groups. (D) The in situ growth rate of SRGs in metagenomes from Nepali individuals, stratified by status as “VANISH” (middle), “BloSSUM” (bottom), or neither (top) (* P ≤ 0.05; ** P ≤ 0.01; Wilcoxon rank-sum test). (E) The association of Pfams with VANISH or BloSSUM genomes. The x-axis displays the fraction of BloSSUM genomes a Pfam is detected in minus the fraction of VANISH genomes a Pfam is detected in (Pfam differential prevalence). The y-axis displays the p-value resulting from Fisher’s exact test with multiple hypothesis correction.
Fig. 5.
Fig. 5.. Spirochaetota that are highly abundant in the Hadza are absent in industrial samples.
(A) A heatmap showing the relative abundance of the 10 most prevalent Spirochaetota species in the Hadza, Nepali, and American cohorts. All samples are sequenced to approximately the same sequencing depth. (B) A phylogenetic tree of all Spirochaetota species using genomes from NCBI, the UHGG and the species-representative genomes added in this study. Clades of commensal organisms in the genera Brachyspira, Spirochaeta, and Treponema are highlighted. (C) A phylogenetic tree of all Treponema succinifaciens MAGs in the UHGG in addition to new MAGs recovered in this study (annotated in outer ring). The inner ring is colored based on the country of origin of the individual contributing the MAG. (D) World map showing locations of populations from which T. succinifaciens MAGS were recovered as nodes (TZA = Tanzania, MDG = Madagascar, NEP = Nepal, FIJ = Fiji, PER = Peru, ELS = El Salvador). Arrows indicate the detection of transition events between populations as detected by stochastic character mapping. Thickness of the arrow indicates frequency of the transition event (thickest arrow is Tanzania to Fiji, 17.1%). The top 7 most frequent transition events are shown, accounting for 65.7% of all transitions.
Fig. 6.
Fig. 6.. Microdiversity, growth rates, and patterns of strain sharing among Hadza gut bacteria.
(A) Pfams with high or low pN/pS values in Hadza fecal metagenomes. The x-axis displays the mean pN/pS value of all genes annotated with each Pfam within Hadza fecal metagenomes. The y-axis displays the probability that the number of times genes annotated as each Pfam were in the top 10% or bottom 10% of all genes on detected genomes was due to random chance (binomial test with multiple hypothesis correction). The 30 Pfams with the lowest p-values for low and high pN/pS were manually annotated with broad functional categories. (B) In situ growth rate measurements of all taxa detected in Hadza adult metagenomes across seasons. Error bars indicate 95% confidence intervals. (n.s. P > 0.05; **** P ≤ 0.0001; Wilcoxon rank-sum test). (C) Rectangles along the circumference represent Hadza individuals and each link drawn between boxes indicates a shared strain. Links between members of the same bush camp are colored based on the bush camp; links between bush camps are colored black. The mean number of strains shared between members of the same bush camp and the p-value comparing strains sharing among members of that bush camp vs members from different bush camps are shown (Wilcoxon rank-sums test). (D) The mean number of strains shared between Hadza adults broken down by various types of familial relationships. Exact p-values shown from Wilcoxon rank-sum test.

References

    1. Abdill R.J., Adamowicz E.M., and Blekhman R. (2022). Public human microbiome data are dominated by highly developed countries. PLoS Biol. 20, e3001536. - PMC - PubMed
    1. Almeida A., Mitchell A.L., Boland M., Forster S.C., Gloor G.B., Tarkowska A., Lawley T.D., and Finn R.D. (2019). A new genomic blueprint of the human gut microbiota. Nature 568, 499–504. - PMC - PubMed
    1. Almeida A., Nayfach S., Boland M., Strozzi F., Beracochea M., Shi Z.J., Pollard K.S., Sakharova E., Parks D.H., Hugenholtz P., et al. (2020). A unified catalog of 204,938 reference genomes from the human gut microbiome. Nat. Biotechnol. 10.1038/s41587-020-0603-3. - DOI - PMC - PubMed
    1. Andrews S. (2010). FastQC: a quality control tool for high throughput sequence data.
    1. Bäckhed F., Roswall J., Peng Y., Feng Q., Jia H., Kovatcheva-Datchary P., Li Y., Xia Y., Xie H., Zhong H., et al. (2015). Dynamics and Stabilization of the Human Gut Microbiome during the First Year of Life. Cell Host Microbe 17, 690–703. - PubMed

Publication types