This is a preprint.

It has not yet been peer reviewed by a journal.

The National Library of Medicine is running a pilot to include preprints that result from research funded by NIH in PMC and PubMed.

[Preprint]. 2022 Oct 6:2022.03.30.486478.

doi: 10.1101/2022.03.30.486478.

Ultra-deep Sequencing of Hadza Hunter-Gatherers Recovers Vanishing Gut Microbes

Affiliations

¹ Department of Microbiology and Immunology, Stanford University School of Medicine, Stanford, CA, USA.
² Chan Zuckerberg Biohub, San Francisco, CA, USA.
³ Genetic Heritage Group, Program in Biology, New York University Abu Dhabi, Abu Dhabi, UAE.
⁴ Center for Human Microbiome Studies, Stanford University School of Medicine, Stanford, CA, USA.

PMID: 36238714
PMCID: PMC9558438
DOI: 10.1101/2022.03.30.486478

Ultra-deep Sequencing of Hadza Hunter-Gatherers Recovers Vanishing Gut Microbes

Bryan D Merrill et al. bioRxiv. 2022.

[Preprint]. 2022 Oct 6:2022.03.30.486478.

doi: 10.1101/2022.03.30.486478.

Authors

Affiliations

¹ Department of Microbiology and Immunology, Stanford University School of Medicine, Stanford, CA, USA.
² Chan Zuckerberg Biohub, San Francisco, CA, USA.
³ Genetic Heritage Group, Program in Biology, New York University Abu Dhabi, Abu Dhabi, UAE.
⁴ Center for Human Microbiome Studies, Stanford University School of Medicine, Stanford, CA, USA.

PMID: 36238714
PMCID: PMC9558438
DOI: 10.1101/2022.03.30.486478

Update in

Ultra-deep sequencing of Hadza hunter-gatherers recovers vanishing gut microbes.
Carter MM, Olm MR, Merrill BD, Dahan D, Tripathi S, Spencer SP, Yu FB, Jain S, Neff N, Jha AR, Sonnenburg ED, Sonnenburg JL. Carter MM, et al. Cell. 2023 Jul 6;186(14):3111-3124.e13. doi: 10.1016/j.cell.2023.05.046. Epub 2023 Jun 21. Cell. 2023. PMID: 37348505 Free PMC article.

Abstract

The gut microbiome is a key modulator of immune and metabolic health. Human microbiome data is biased towards industrialized populations, providing limited understanding of the distinct and diverse non-industrialized microbiomes. Here, we performed ultra-deep metagenomic sequencing and strain cultivation on 351 fecal samples from the Hadza, hunter-gatherers in Tanzania, and comparative populations in Nepal and California. We recover 94,971 total genomes of bacteria, archaea, bacteriophages, and eukaryotes, 43% of which are absent from existing unified datasets. Analysis of in situ growth rates, genetic pN/pS signatures, high-resolution strain tracking, and 124 gut-resident species vanishing in industrialized populations reveals differentiating dynamics of the Hadza gut microbiome. Industrialized gut microbes are enriched in genes associated with oxidative stress, possibly a result of microbiome adaptation to inflammatory processes. This unparalleled view of the Hadza gut microbiome provides a valuable resource that expands our understanding of microbes capable of colonizing the human gut and clarifies the extensive perturbation brought on by the industrialized lifestyle.

PubMed Disclaimer

Conflict of interest statement

Competing interests

The authors declare no competing interests.

Figures

**Fig. 1.. A vast resource of Hadza gut microbiome data.**
**(A)** Overview of sample collection for shotgun metagenomic sequencing of Hadza fecal samples. (B) Summary of the computational workflow, tools used, and primary data generated from Hadza stool samples. **(C)** Number of samples versus the number of bases sequenced per sample for 21 previously published human gut metagenomic data sets and the present study.

**Fig. 2.. The Hadza gut microbiota contains substantial multi-domain novelty.**
(A) Phylogenetic tree of bacterial species-level representative genomes (SRGs) from Hadza and UHGG based on bacterial single copy gene alignment; branch colors correspond to phyla. SRGs from species-level groups consisting of only genomes assembled from the Hadza or only UHGG are colored green and orange in the outer ring, respectively. The number of SRGs found in the Hadza, UHGG, or both is shown as a horizontal line. Hadza genomes that are novel at the family or order level according to GTDB are annotated with red and blue stars, respectively. (B) The percentage of bacteriophage species clusters assembled from the Hadza that are novel at the species level according to the MGV ((Nayfach et al., 2021b)), categorized by phylum of the predicted host. Bacteriophages without a host prediction are labeled “Uncharacterized”. **(C)** A phylogenetic tree of eukaryotic genomes recovered from Hadza and Nepali gut metagenomes based on universal single copy genes. Public reference genomes are marked with blue text labels. The heatmap shows the prevalence of the individual eukaryotes in the Hadza, Nepali and Californian cohorts. (D) For each population, the percentage of predicted proteins from recovered genomes that are present in the UniRef100 and UHGP-95 (Almeida et al., 2020) protein databases. **(E)** The percentage of metagenomic reads mapping to various domains averaged across all metagenomic samples from each population. The phyla “Bacteriodota” and “Firmicutes_A” are shown separated from other bacteria. “Unmapped” depicts the percentage of reads that do not map to any genomes, and “Low confidence” depicts the percentage of reads that map to genomes with less than 50% genome breadth. **(F)** The Shannon diversity of bacteria, archaea, bacteriophage, and eukaryote genomes in metagenomes sequenced in this study. P-values from two-sided Mann-Whitney-Wilcoxon test with multiple hypothesis correction; *: p < 0.05, **: p < 0.01, ***: p < 0.001, ****: p < 0.0001, ns: p ≥ 0.05. **(G)** Collectors’ curves depicting the average number of genomes detected per sample in each population sequenced in this study after rarefaction to various sequencing depths. The vertical dotted lines indicate the average per-sample sequencing depth of this study (~23 Gbp) and the average depth of samples studied previously (~4 Gbp; ref. (Almeida et al., 2020)). Shaded areas around lines indicate 95% confidence intervals. “Nepal For.” includes the Chepang foragers, while “Nepal Ag.” includes Raute, Raji, and Tharu agrarians.

**Fig. 3.. Increased sequencing depth results in the detection of novel and phylogenetically distinct taxa.**
(A) The number of genomes detected in individual samples sequenced in this study when limiting sequencing depth by 5 Gbp increments. Each line represents an individual sample from which ≥ 50 Gbp of trimmed, filtered reads were generated. Lines are colored by population. Vertical dotted lines indicate the average per-sample sequencing depth of this study (23 Gbp) and the average per-sample sequencing depth of samples used in Almleida et al. (4 Gbp (Almeida et al., 2020)). (B) Taxonomic distribution of organisms present at different ranges of relative abundance levels (horizontal stacked bar plots) and the percentage of species that are novel according to GTDB r95 (text percentages right of horizontal bars). Organisms detected at low relative abundance levels are more likely to be novel than those that are more abundant.

**Fig. 4.. VANISH and BloSSUM taxa have distinct global prevalence, function, growth rates and covariance with eukaryote detection.**
(A) A heatmap depicting the presence of 524 SRGs (columns) within metagenomic samples from populations living different lifestyles (rows). Darker blue indicates SRG presence, lighter blue indicates SRG absence. SRGs with >30% prevalence among all samples in any lifestyle category were included. **(B)** SRGs were classified as “BloSSUM” or “VANISH” based on their prevalence across lifestyles (see methods for details). Colored bars correspond to columns in the heatmap. **(C)** The prevalence of VANISH (magenta), BloSSUM (blue) and non-enriched taxa (gray) in the Hadza, transitional lifestyle populations and industrial lifestyle populations. Dashed lines connect median prevalence across the taxa in each category surrounded by standard deviation (color shaded regions). Solid lines show the median prevalence for 6 representative taxa in each of these lifestyle groups. (D) The *in situ* growth rate of SRGs in metagenomes from Nepali individuals, stratified by status as “VANISH” (middle), “BloSSUM” (bottom), or neither (top) (* P ≤ 0.05; ** P ≤ 0.01; Wilcoxon rank-sum test). **(E)** The association of Pfams with VANISH or BloSSUM genomes. The x-axis displays the fraction of BloSSUM genomes a Pfam is detected in minus the fraction of VANISH genomes a Pfam is detected in (Pfam differential prevalence). The y-axis displays the p-value resulting from Fisher’s exact test with multiple hypothesis correction.

**Fig. 5.. Spirochaetota that are highly abundant in the Hadza are absent in industrial samples.**
(A) A heatmap showing the relative abundance of the 10 most prevalent Spirochaetota species in the Hadza, Nepali, and American cohorts. All samples are sequenced to approximately the same sequencing depth. (B) A phylogenetic tree of all Spirochaetota species using genomes from NCBI, the UHGG and the species-representative genomes added in this study. Clades of commensal organisms in the genera *Brachyspira*, *Spirochaeta*, and *Treponema* are highlighted. (C) A phylogenetic tree of all *Treponema succinifaciens* MAGs in the UHGG in addition to new MAGs recovered in this study (annotated in outer ring). The inner ring is colored based on the country of origin of the individual contributing the MAG. (D) World map showing locations of populations from which *T. succinifaciens* MAGS were recovered as nodes (TZA = Tanzania, MDG = Madagascar, NEP = Nepal, FIJ = Fiji, PER = Peru, ELS = El Salvador). Arrows indicate the detection of transition events between populations as detected by stochastic character mapping. Thickness of the arrow indicates frequency of the transition event (thickest arrow is Tanzania to Fiji, 17.1%). The top 7 most frequent transition events are shown, accounting for 65.7% of all transitions.

**Fig. 6.. Microdiversity, growth rates, and patterns of strain sharing among Hadza gut bacteria.**
**(A)** Pfams with high or low *pN/pS* values in Hadza fecal metagenomes. The x-axis displays the mean *pN/pS* value of all genes annotated with each Pfam within Hadza fecal metagenomes. The y-axis displays the probability that the number of times genes annotated as each Pfam were in the top 10% or bottom 10% of all genes on detected genomes was due to random chance (binomial test with multiple hypothesis correction). The 30 Pfams with the lowest p-values for low and high *pN/pS* were manually annotated with broad functional categories. **(B)** *In situ* growth rate measurements of all taxa detected in Hadza adult metagenomes across seasons. Error bars indicate 95% confidence intervals. (n.s. P > 0.05; **** P ≤ 0.0001; Wilcoxon rank-sum test). **(C)** Rectangles along the circumference represent Hadza individuals and each link drawn between boxes indicates a shared strain. Links between members of the same bush camp are colored based on the bush camp; links between bush camps are colored black. The mean number of strains shared between members of the same bush camp and the p-value comparing strains sharing among members of that bush camp vs members from different bush camps are shown (Wilcoxon rank-sums test). **(D)** The mean number of strains shared between Hadza adults broken down by various types of familial relationships. Exact p-values shown from Wilcoxon rank-sum test.

See this image and copyright information in PMC

References

1. Abdill R.J., Adamowicz E.M., and Blekhman R. (2022). Public human microbiome data are dominated by highly developed countries. PLoS Biol. 20, e3001536. - PMC - PubMed
1. Almeida A., Mitchell A.L., Boland M., Forster S.C., Gloor G.B., Tarkowska A., Lawley T.D., and Finn R.D. (2019). A new genomic blueprint of the human gut microbiota. Nature 568, 499–504. - PMC - PubMed
1. Almeida A., Nayfach S., Boland M., Strozzi F., Beracochea M., Shi Z.J., Pollard K.S., Sakharova E., Parks D.H., Hugenholtz P., et al. (2020). A unified catalog of 204,938 reference genomes from the human gut microbiome. Nat. Biotechnol. 10.1038/s41587-020-0603-3. - DOI - PMC - PubMed
1. Andrews S. (2010). FastQC: a quality control tool for high throughput sequence data.
1. Bäckhed F., Roswall J., Peng Y., Feng Q., Jia H., Kovatcheva-Datchary P., Li Y., Xia Y., Xie H., Zhong H., et al. (2015). Dynamics and Stabilization of the Human Gut Microbiome during the First Year of Life. Cell Host Microbe 17, 690–703. - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

This is a preprint.

Ultra-deep Sequencing of Hadza Hunter-Gatherers Recovers Vanishing Gut Microbes

Affiliations

Ultra-deep Sequencing of Hadza Hunter-Gatherers Recovers Vanishing Gut Microbes

Authors

Affiliations

Update in

Abstract

Conflict of interest statement

Figures

References

Publication types

Grants and funding

LinkOut - more resources

Full Text Sources