Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Feb 22;13(1):926.
doi: 10.1038/s41467-021-27917-x.

Short- and long-read metagenomics of urban and rural South African gut microbiomes reveal a transitional composition and undescribed taxa

Collaborators, Affiliations

Short- and long-read metagenomics of urban and rural South African gut microbiomes reveal a transitional composition and undescribed taxa

Fiona B Tamburini et al. Nat Commun. .

Abstract

Human gut microbiome research focuses on populations living in high-income countries and to a lesser extent, non-urban agriculturalist and hunter-gatherer societies. The scarcity of research between these extremes limits our understanding of how the gut microbiota relates to health and disease in the majority of the world's population. Here, we evaluate gut microbiome composition in transitioning South African populations using short- and long-read sequencing. We analyze stool from adult females living in rural Bushbuckridge (n = 118) or urban Soweto (n = 51) and find that these microbiomes are taxonomically intermediate between those of individuals living in high-income countries and traditional communities. We demonstrate that reference collections are incomplete for characterizing microbiomes of individuals living outside high-income countries, yielding artificially low beta diversity measurements, and generate complete genomes of undescribed taxa, including Treponema, Lentisphaerae, and Succinatimonas. Our results suggest that the gut microbiome of South Africans does not conform to a simple "western-nonwestern" axis and contains undescribed microbial diversity.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Taxonomic composition of South African study participant microbiota.
Sequence data were taxonomically classified using Kraken 2 with a database containing all genomes in RefSeq and GenBank of “scaffold” quality or better as of January 2020. a Top 20 genera by mean relative abundance for samples from participants in Bushbuckridge and Soweto, sorted by decreasing Prevotella abundance. Prevotella, Bacteroides, and Faecalibacterium are the most prevalent genera across both study sites. b Relative abundance of VANISH genera by study site, grouped by family (n = 118 Bushbuckridge, n = 51 Soweto). A pseudocount of 1 read was added to each sample prior to relative abundance normalization in order to plot on a log scale, as the abundance of some genera in some samples is zero. Relative abundance values of most VANISH genera are higher on average in participants from Bushbuckridge than Soweto (two-sided Wilcoxon rank-sum test, significance values denoted as follows: *p < 0.05, **p < 0.01, ***p < 0.001, ****p < 0.0001, (ns) not significant). Exact p values from left to right: 3.91e−2, 3.28e−1, 1.60e−2, 4.55e−3, 6.64e−3, 1.93e−5, 9.20e−3, 7.29e−3, 6.93e−2, 6.87e−4, 1.64e−11, 7.66e−6, 1.02e−7. Box plot lower and upper hinges correspond to the first and third quartiles, upper and lower whiskers represent the highest and lowest values within 1.5 times the interquartile range, and the horizontal line represents the median.
Fig. 2
Fig. 2. Comparison of Bushbuckridge and Soweto microbiomes.
a Multidimensional scaling (MDS) of pairwise Bray–Curtis distance between samples (rarefied to 1.44 M counts per sample to control for read depth and cumulative sum scaling normalized). Soweto samples have greater dispersion than Bushbuckridge (PERMDISP2 p < 0.001). b Shannon diversity calculated on rarefied species-level taxonomic classifications for each sample (participant n = 118 Bushbuckridge, n = 51 Soweto). Samples from Bushbuckridge are higher in alpha diversity than samples from Soweto (two-sided Wilcoxon rank-sum test, p = 0.042). For box plots, lower and upper hinges correspond to the first and third quartiles, upper and lower box plot whiskers represent the highest and lowest values within 1.5 times the interquartile range, and the horizontal line represents the median. c DESeq2 identifies microbial genera that are differentially abundant in rural Bushbuckridge compared to the urban Soweto cohort. Features with log2 fold change greater than one are plotted (full results in Supplementary Data 5).
Fig. 3
Fig. 3. Community-level comparison of global microbiomes.
Comparisons of South African microbiome data to microbiome sequence data from four publicly available cohorts representing western (United States, Sweden) and nonwestern (Tanzania, Madagascar, Burkina Faso) populations. a Number of participants per cohort. b Multidimensional scaling of pairwise Bray–Curtis distance between samples from six datasets of healthy adult shotgun microbiome sequencing data. Western populations (Sweden, United States) cluster away from African populations practicing a traditional lifestyle (Madagascar, Tanzania, Burkina Faso) while transitional South African microbiomes overlap with both western and nonwestern populations. Shown below are scatterplots of relative abundance of the top four taxa most correlated with MDS 1 (Spearman’s rho, Spirochaetaceae −0.824, Succinivibrionaceae −0.804, Bacteroidaceae 0.769, and Prevotellaceae −0.752) against multidimensional scaling axis 1 (MDS 1) on the x-axis. c Box plots of the first axis of MDS (MDS 1) which correlates with geography and lifestyle, and the second axis of MDS (MDS 2), which shows a distinct separation of South Africans from the other cohorts. d Shannon diversity across cohorts. Shannon diversity was calculated from data rarefied to the number of counts of the lowest sample. For box plots in c and d, lower and upper hinges correspond to the first and third quartiles, upper and lower box plot whiskers represent the highest and lowest values within 1.5 times the interquartile range, and the horizontal line represents the median. Participant sample size in ad is as follows, with one sample per participant: n = 22 Tanzania, n = 112 Madagascar, n = 90 Burkina Faso, n = 118 Bushbuckridge, n = 51 Soweto, n = 100 Sweden, n = 134 United States.
Fig. 4
Fig. 4. Comparison of beta diversity between communities calculated by taxonomy vs. nucleotide k-mer composition.
a Percentage of reads classifiable at any taxonomic rank, by cohort, based on a reference database of all genomes “scaffold” quality or higher in RefSeq and GenBank as of January 2020. Read classification is higher in western vs. nonwestern microbiomes (one-sided Wilcoxon rank-sum test between Soweto and Sweden, p = 2.56e−8), and higher in Soweto relative to Bushbuckridge (one-sided Wilcoxon rank-sum test, p = 2.43e−4). b Comparison of microbiome sequence data using k-mer sketches, a reference-free approach that allows comparison of nucleotide sequence composition. Briefly, a hash function generates signatures at varying sequence lengths (k) and k-mer sketches can be compared between samples. Plot shows non-metric multidimensional scaling (NMDS) of angular distance values between each pair of samples at k = 31 (approx. species-level). ce Comparison of pairwise beta diversity within communities using Bray–Curtis distance for species and angular distance for nucleotide k-mer sketches. c Species beta diversity is higher in Soweto vs. all populations (one-sided Wilcoxon rank-sum test, FDR-adjusted q < 2.7e−16 for all tests) except for the United States, where beta diversity in Soweto is lower (one-sided Wilcoxon rank-sum test, q = 4.05e−6). Nucleotide k-mer diversity is higher in Soweto vs. all populations (one-sided Wilcoxon rank-sum test, FDR-adjusted q < 2.2e−16 for all tests). d Species beta diversity is higher in Sweden compared to Bushbuckridge, but nucleotide k-mer distance is higher in Bushbuckridge (p < 2.22e−16 for both tests). e Species beta diversity is higher in the United States cohort compared to the Malagasy, but nucleotide k-mer distance is higher in the Malagasy (p < 2.22e−16 species, p = 0.034 k-mer). For all box plots in a, ce, lower and upper hinges correspond to the first and third quartiles, upper and lower box plot whiskers represent the highest and lowest values within 1.5 times the interquartile range, and the horizontal line represents the median. Significance values for two-sided Wilcoxon rank-sum tests denoted as follows: *p < 0.05, **p < 0.01, ***p < 0.001, ****p < 0.0001. One sample per participant, sample size in ae is: n = 22 Tanzania, n = 112 Madagascar, n = 90 Burkina Faso, n = 118 Bushbuckridge, n = 51 Soweto, n = 100 Sweden, n = 134 United States.
Fig. 5
Fig. 5. Complete and contiguous genomes of South African microbiota.
a Phylogenetic tree of de-replicated short-read MAGs and medium- and high-quality nanopore MAGs (green circles). Innermost ring indicates GTDB phylum, middle ring indicates study site associated with each MAG, and outer ring indicates the highest average nucleotide identity between each MAG and genomes from the UHGG. b A selection of MAGs assembled from long-read sequencing (green) of three South African samples compared contigs assembled from corresponding short-read data (gray). Third track (pink) indicates sliding genomic GC content, and fourth track (yellow) indicates sliding genomic GC skew. Breaks in circles represent different contigs. Genomic information within plots refer to assembly statistics of nanopore MAGs. c Number of additional genomic elements present in medium- and high-quality nanopore MAGs (n = 22) that are absent in corresponding short-read MAGs for the same organism, as diagrammed in the left-hand panel. Box plot lower and upper hinges correspond to the first and third quartiles, upper and lower box plot whiskers represent the highest and lowest values within 1.5 times the interquartile range, and the horizontal line represents the median. ANI average nucleotide identity, Mb megabase, Abx antibiotics, MAG metagenome-assembled genome.

References

    1. Human Microbiome Project Consortium. Structure, function and diversity of the healthy human microbiome. Nature. 2012;486:207–214. - PMC - PubMed
    1. Qin J, et al. A human gut microbial gene catalogue established by metagenomic sequencing. Nature. 2010;464:59–65. - PMC - PubMed
    1. Gupta VK, Paul S, Dutta C. Geography, ethnicity or subsistence-specific variations in human microbiome composition and diversity. Front. Microbiol. 2017;8:1162. - PMC - PubMed
    1. Brewster, R. et al. Surveying gut microbiome research in Africans: toward improved diversity and representation. Trends Microbiol. 10.1016/j.tim.2019.05.006 (2019). - PMC - PubMed
    1. Yatsunenko T, et al. Human gut microbiome viewed across age and geography. Nature. 2012;486:222–227. - PMC - PubMed

Publication types