Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Oct 1;37(10):2944-2954.
doi: 10.1093/molbev/msaa140.

Khoe-San Genomes Reveal Unique Variation and Confirm the Deepest Population Divergence in Homo sapiens

Affiliations

Khoe-San Genomes Reveal Unique Variation and Confirm the Deepest Population Divergence in Homo sapiens

Carina M Schlebusch et al. Mol Biol Evol. .

Abstract

The southern African indigenous Khoe-San populations harbor the most divergent lineages of all living peoples. Exploring their genomes is key to understanding deep human history. We sequenced 25 full genomes from five Khoe-San populations, revealing many novel variants, that 25% of variants are unique to the Khoe-San, and that the Khoe-San group harbors the greatest level of diversity across the globe. In line with previous studies, we found several gene regions with extreme values in genome-wide scans for selection, potentially caused by natural selection in the lineage leading to Homo sapiens and more recent in time. These gene regions included immunity-, sperm-, brain-, diet-, and muscle-related genes. When accounting for recent admixture, all Khoe-San groups display genetic diversity approaching the levels in other African groups and a reduction in effective population size starting around 100,000 years ago. Hence, all human groups show a reduction in effective population size commencing around the time of the Out-of-Africa migrations, which coincides with changes in the paleoclimate records, changes that potentially impacted all humans at the time.

Keywords: Khoe-San; population structure; southern Africa.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Sample locations and genetic diversity in the Khoe-San. (A) Sample locations across the world. Colors depict the various data sets included in the study and sample sizes are indicated after the population code. CG, Complete Genomics diversity set (Drmanac et al. 2010); HGDP, HGDP data (Meyer et al. 2012); KGP, 1000 Genomes typed on Complete Genomics platform (1000 Genomes Project Consortium 2015); KSP, this study; LC, Lachance et al. (2012); SGDP, Simons Genome Diversity Project (Mallick et al. 2016); BBA, Ballito Bay A (Schlebusch et al. 2017). The locations chosen for the CEU, GIH, and MXL reflect the ancestry of the population (not the sampling location). (B) Sample locations across Africa. Populations in boldface display newly sequenced individuals. (C) Genetic (autosomal) variation for three population groups: Khoe-San, other sub-Saharan Africans, and non-Africans. The summary statistics were calculated on the joint KSP and HGDP group called data set to avoid biases. The KSP and HGDP data sets were both sequenced on Illumina platforms. Note that the HGDP San individual was not included in the metrics shown here. Heterozygosity was computed from the number of variable positions divided by number of sequenced positions, and averaged across individuals. Mean total runs of homozygosity (ROH) displays the sum over the lengths 0.2–1 Mb. Average (across the genome) number of distinct alleles (allelic richness) and average number of alleles are unique to a single population (private allelic richness) in a sample of eight haploid genomes per variable site. Standard errors were calculated. For heterozygosity, it is the standard error of the mean per individual, averaged across individuals. For ROH, it is the standard error of the mean of individuals. Standard errors for heterozygosity and for allelic richness were very small (<0.08%, see supplementary sections 5.2, 5.3, and 5.5, Supplementary Material online, for details). (D) Private allelic richness (per variable site) of alleles shared by pairwise combinations of the five Khoe-San populations. We distinguish three groups: northern San (Ju|’hoansi and !Xun), central San (|Gui and ‖Gana), and southern San (Nama and Karretjie). (E) Venn diagram summarizing private and shared variants in the Khoe-San versus other Africans versus non-Africans.
Fig. 2.
Fig. 2.
Grouped bar-plots summarizing private allele sharing as a fraction of the total number of variant sites in the data set: (A) Privately shared alleles of various Khoe-San groups with comparative groups. (B) Privately shared alleles of comparative groups.
Fig. 3.
Fig. 3.
Population divergence estimates. (A) Schematic overview of the estimated population divergences. The colored nodes correspond to the population divergences that were estimated with the TT method and GPhoCS, and the estimates are presented in (B). (B) Distribution of divergence time estimates based on GPhoCS (unscaled estimates, means, and medians available in supplementary table S7.1, Supplementary Material online) and mean ± standard error of the divergence time estimated with the TT method (supplementary table S7.2, Supplementary Material online).
Fig. 4.
Fig. 4.
Estimates of effective population size across time. (A) Effective population sizes estimated for autosomal data from single individuals (i.e., two chromosomes) for the Khoe-San (average over the five individuals in each population), the HGDP individuals, and the Stone Age southern African Ballito Bay A boy (BBA; Schlebusch et al. 2017). (B) African temperature variation estimated from the reconstruction of sea surface temperature in the southwestern Indian Ocean (Caley et al. 2018). (C) Khoe-San effective population sizes estimated from single individuals (“two chromosomes,” solid gray), pairs of individuals (“four chromosomes,” solid colored lines), and five individuals (“ten chromosomes,” colored dotted lines). The curves are averaged over all MSMC runs for all different combinations of individuals (respectively, five, ten, and one).
Fig. 5.
Fig. 5.
Signatures of adaptation in the genomes. (A) Schematic overview of the three different population branch statistic (PBS) based analyses. The different PBS-based statistics are designed to capture adaptation signals in different parts of the phylogeny. (B) Manhattan plot of the archaicPBS statistic across the genome (supplementary fig. S12.3, Supplementary Material online, displays the aPBS and the emhPBS results). The eight dashed red lines show all the top-five peaks among the three PBS statistics (they are highly correlated). The most likely candidate genes are written below the peaks with genes involved with brain functions, immune system, and other functions indicated in blue, green, and black, respectively. The dashed horizontal line shows the 99.9% percentile of the archaicPBS statistic for these data. (C) A close-up of the strongest peak for archaicPBS, which is located upstream of the gene LPHN3. (D) An example of a local selection signal in southern Khoe-San. |iHS| for southern Khoe-San is shown in green, |iHS| for northern Khoe-San in red, and XP-EHH in purple. The strong negative XP-EHH values suggest adaptation in southern Khoe-San.

References

    1. 1000 Genomes Project ConsortiumAuton A, Brooks LD, Durbin RM, Garrison EP, Kang HM, Korbel JO, Marchini JL, McCarthy S, McVean GA, Abecasis GR.. 2015. A global reference for human genetic variation. Nature 526(7571):68–74. - PMC - PubMed
    1. Beuning KRM, Zimmerman KA, Ivory SJ, Cohen AS.. 2011. Vegetation response to glacial-interglacial climate variability near Lake Malawi in the southern African tropics. Palaeogeogr Palaeoclimatol Palaeoecol. 303(1–4):81–92.
    1. Breton G, Schlebusch CM, Lombard M, Sjodin P, Soodyall H, Jakobsson M.. 2014. Lactase persistence alleles reveal partial East African ancestry of southern African Khoe pastoralists. Curr Biol. 24(8):852–858. - PubMed
    1. Caley T, Extier T, Collins JA, Schefuß E, Dupont L, Malaizé B, Rossignol L, Souron A, McClymont EL, Jimenez-Espejo FJ, et al.2018. A two-million-year-long hydroclimatic context for hominin evolution in southeastern Africa. Nature 560(7716):76–79. - PubMed
    1. Cann RL, Stoneking M, Wilson AC.. 1987. Mitochondrial DNA and human evolution. Nature 325(6099):31–36. - PubMed

Publication types