Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Jan 10:12:e79111.
doi: 10.7554/eLife.79111.

Balancing selection on genomic deletion polymorphisms in humans

Affiliations

Balancing selection on genomic deletion polymorphisms in humans

Alber Aqil et al. Elife. .

Abstract

A key question in biology is why genomic variation persists in a population for extended periods. Recent studies have identified examples of genomic deletions that have remained polymorphic in the human lineage for hundreds of millennia, ostensibly owing to balancing selection. Nevertheless, genome-wide investigation of ancient and possibly adaptive deletions remains an imperative exercise. Here, we demonstrate an excess of polymorphisms in present-day humans that predate the modern human-Neanderthal split (ancient polymorphisms), which cannot be explained solely by selectively neutral scenarios. We analyze the adaptive mechanisms that underlie this excess in deletion polymorphisms. Using a previously published measure of balancing selection, we show that this excess of ancient deletions is largely owing to balancing selection. Based on the absence of signatures of overdominance, we conclude that it is a rare mode of balancing selection among ancient deletions. Instead, more complex scenarios involving spatially and temporally variable selective pressures are likely more common mechanisms. Our results suggest that balancing selection resulted in ancient deletions harboring disproportionately more exonic variants with GWAS (genome-wide association studies) associations. We further found that ancient deletions are significantly enriched for traits related to metabolism and immunity. As a by-product of our analysis, we show that deletions are, on average, more deleterious than single nucleotide variants. We can now argue that not only is a vast majority of common variants shared among human populations, but a considerable portion of biologically relevant variants has been segregating among our ancestors for hundreds of thousands, if not millions, of years.

Keywords: Denisovans; Neanderthals; copy number variation; evolution; evolutionary biology; genetics; genomics; human; structural variation.

Plain language summary

The persistence of versions of genes that cause severe disease in human populations has long perplexed scientists. It is common for many versions of a gene to exist. But scientists expect that over time natural selection will eliminate versions of genes harmful to human health. Sometimes, there are good reasons that a disease-causing gene may persist. For example, having two copies of a particular gene variant causes a condition, called sickle cell disease. But having one sickle cell-causing copy of the gene and one non-disease-causing copy protects against malaria. As a result, the version of the gene that causes sickle cell is more common in people from areas where malaria is prevalent despite the risks to people who end up with two copies. Scientists call this phenomenon balancing selection because trade-offs in the gene’s benefits and risks cause it to persist in the population. Aqil et al. show that balancing selection has likely caused many ancient gene variants to persist in human populations. In the experiments, Aqil et al. scoured the genomes of hundreds of modern humans from around the world and four groups of ancient human ancestors, including Neanderthals and Denisovans. The experiments looked for structural changes in genes, like deletions, that date back to more than 700,000 years ago – before modern humans split from their ancestors. They found large numbers of such ancient genes in modern humans. Using computer modeling, Aqil et al. showed that these ancient genes likely persist because of balancing selection. Many of these ancient genes regulate the immune response and metabolism. These genes may protect against infectious diseases outbreaks and starvation, which have occurred periodically throughout human history. But these same genes may cause immune or metabolic diseases in modern humans not currently facing these threats. The experiments show how such biological trade-offs have shaped human evolution and reveal that modern human populations, regardless of race or region of origin, share the same genetic variation that already our ancestors carried within them.

PubMed Disclaimer

Conflict of interest statement

AA, LS, PP, OG No competing interests declared

Figures

Figure 1.
Figure 1.. Excess of ancient polymorphisms segregating in anatomically modern humans (AMHs).
(A) A schematic representation of derived ‘ancient’ variants (purple) that emerged before the AMH-archaic hominin divergence (and after hominin-chimp divergence), and have remained polymorphic in the AMH lineage. The ancestral variants are indicated as orange, and the derived chimpanzee-specific variants are indicated in light blue. (B) The Speidel et al. and Gravel et al. simulation parameters. Speidel et al. provide parameters that involve varying population sizes for the YRI population. (C) Expected distribution of the proportion of ancient polymorphisms in YRI under different models. Each distribution is labeled with three parameters in the form (AMH-Ne, Archaic-Ne, time since archaic-AMH divergence). The simulations where we used variable effective population size published by Spiedel et al. are indicated by blue color and labeled ‘Var’. The simulations where AMH-Ne is constant are shown in orange, and provide the population size used. The vertical line represents the empirical proportion of ancient polymorphisms in YRI.
Figure 1—figure supplement 1.
Figure 1—figure supplement 1.. Proportion of ancient polymorphisms in observed data (YRI), relative to neutral expectation (‘base’ model parameters) in various derived allele frequency bins.
The vertical blue line indicates the observed sharing, while the distributions are simulated expectations. The excess of ancient polymorphisms in observed data becomes more pronounced at higher derived allele frequencies.
Figure 1—figure supplement 2.
Figure 1—figure supplement 2.. Simulation results.
(A) Results from simulations invoking structure in the population that was ancestral to both anatomically modern humans (AMHs) and archaic hominins. In this model, we have three latent subgroups in the ancestral populations. The x-axis refers to the fraction of each subgroup that is formed by the migrants of each of the other subgroups in each generation. (B) Proportion of ancient polymorphisms in YRI. The purple line is the observed proportion of ancient polymorphisms in Yoruba (YRI). The green and orange density plots indicate the distribution of the proportion of ancient polymorphisms in neutral simulations with and without ancestral structure, respectively. We used Gravel et al. parameters for these simulations. (C) Comparison of the allele frequency spectra of simulated single nucleotide variants (SNVs) with observed SNVs. The purple, orange, and green lines represent allele frequency spectra in the YRI population using actual SNVs, neutral simulations without ancestral structure, and neutral simulations invoking ancestral structure, respectively.
Figure 2.
Figure 2.. Deletions in anatomically modern humans (AMHs) that are shared with archaic hominins.
The top panel shows the categorization of deletion polymorphisms as AMH-specific, recurrent (green), introgressed (orange), or ancient (purple). The evolutionary histories of shared deletions are summarized schematically in the bottom panel.
Figure 2—figure supplement 1.
Figure 2—figure supplement 1.. Read depth-based pipeline to identify deletions in archaic hominin genomes: Distribution of the modified Z-score of the read depth across the 32,154 biallelic anatomically modern human (AMH) deletions in the archaic genomes.
(A) Altai neanderthal. (B) Vindija neanderthal. (C) Chagyrskaya neanderthal. (D) Denisovan.
Figure 3.
Figure 3.. Age estimates of the haplotypes harboring polymorphic deletions.
The x-axis shows the age estimates, obtained using Relate, for the deletions. For orienting the reader regarding the age of these variants, we provide below a schematic phylogeny representing recent human evolution.
Figure 3—figure supplement 1.
Figure 3—figure supplement 1.. GEVA ages of deletions across categories.
Absent denotes polymorphic deletions in anatomically modern humans (AMHs) that are not present in any of the four high-coverage archaic genomes. Introgressed refers to the shared deletions that were introgressed from archaic hominins into AMHs. Recurrent refers to the shared deletions that emerged independently in the AMH and archaic hominin lineages. Ancient refers to the AMH deletions that are shared with archaic hominins by common descent. (A) GEVA PRIME-ages. (B) GEVA MAX-ages. With both GEVA PRIME and GEVA MAX measures, we observe that ancient deletions are significantly older than absent, recurrent, and introgressed deletions. This implies that our pipeline to identify ancient deletions is sound.
Figure 4.
Figure 4.. An empirical assessment of putative balancing selection among ancient deletions.
(A) The conceptual framework in which stdβ2 statistic works. The last step demonstrates ‘Goldilocks’ drift (the process that results in allelic class build-up). (B) A box plot for stdβ2 for anatomically modern human (AMH)-specific, versus ancient deletions (frequency >5% in respective populations). Higher stdβ2 values for older deletions represented in purple empirically show that older deletions are significantly enriched for targets of balancing selection. All comparisons are significant, p<10–7 (Wilcoxon).
Figure 5.
Figure 5.. Functional enrichment among ancient deletions.
(A) Functional categorization of common deletions. Within each category, the proportions of deletions falling under different evolutionary categories are shown in pie charts. (B) Permutation-based analysis of enrichment of functionality among ancient deletions, relative to non-ancient deletions. The black horizontal line indicates the expected ratio of 1.0. For each definition of functionality, the number of functional ancient deletions, and the p-value associated with the enrichment are provided. (C) Permutation-based enrichment analysis for different phenotypic categories (based on genome-wide association studies [GWAS]) among ancient deletions, relative to non-ancient deletions. The black horizontal line indicates the expected ratio of 1.0. Dark orange indicates a statistically significant deviation from the expected ratio of 1.0. Light orange means no significant deviation from the expected ratio of 1.0.
Figure 6.
Figure 6.. Phenotypic effects associated with deletions.
(A) The significance levels (-log(p-value)) of phenotypic associations of deletions with genome-wide association studies (GWAS) traits as a function of their emergence time. Gray points indicate non-ancient deletions. Purple and orange points indicate non-exonic ancient deletions with GWAS hits and exonic ancient deletions with GWAS hits, respectively. The genes whose exons are covered by ancient deletions, and the traits associated with ancient deletions are mentioned in the plot. (B) The significance levels (-log(p-value)) and sizes of expression level changes of nearby HLA genes associated with the presence of the deletion esv3608584. Each color refers to a different HLA gene. Each point in a given color represents a different tissue. Only those tissues whose expression level changes are statistically significant are shown here.
Figure 7.
Figure 7.. Ancient versus non-ancient deletions.
(A) The ratios of sizes of ancient deletions to those of non-ancient deletions at different size percentiles. The black horizontal line refers to the expected ratio of 1.0. Dark orange bars refer to a statistically significant (permutation test) deviation from the expected ratio. Light orange bars mean that the deviation from the extend ratio of 1.0 is not statistically significant. (B) The estimated measure of allele frequency change (χ2) between 50,000 and 5000 years before present in common ancient versus common non-ancient deletions. Ancient deletions have significantly (p=2 × 10–7, Wilcoxon) higher frequency variability over the last 50,000 years.
Figure 7—figure supplement 1.
Figure 7—figure supplement 1.. Effects of negative selection and overdominance.
(A) The probability of a polymorphism persisting in the population for 1,000,000 years under different negative selection pressures. (B) Density plots of the first principle component of multiple summary statistics based on variants simulated under neutral versus overdominance (s=0.05) scenarios. This is shown for two categories of variants: (1) those that emerged 290 kya and (2) those that emerged 1160 kya. There is no discernible difference between overdominance and neutrality within the time frame of these simulations. (C) The allele frequency trajectories of variants over 1,000,000 years, under neutrally (top), versus under overdominance (bottom). The x-axis represents the time since the emergence of a variant in years, assuming a 29-year generation time. The right panel is a zoomed-in version of the same allele frequency trajectories in the last ~50,000 years.

References

    1. Abdul-Rahman F, Tranchina D, Gresham D. Fluctuating environments maintain genetic diversity through neutral fitness effects and balancing selection. Molecular Biology and Evolution. 2021;38:4362–4375. doi: 10.1093/molbev/msab173. - DOI - PMC - PubMed
    1. Agarwal V, Kommaddi RP, Valli K, Ryder D, Hyde TM, Kleinman JE, Strobel HW, Ravindranath V. Drug metabolism in human brain: high levels of cytochrome P4503A43 in brain and metabolism of anti-anxiety drug alprazolam to its active metabolite. PLOS ONE. 2008;3:e2337. doi: 10.1371/journal.pone.0002337. - DOI - PMC - PubMed
    1. Aho AV, Kernighan BW, Weinberger PJ. Awk — a pattern scanning and processing language. Software. 1979;9:267–279. doi: 10.1002/spe.4380090403. - DOI
    1. Albers PK, McVean G. Dating genomic variants and shared ancestry in population-scale sequencing data. PLOS Biology. 2020;18:e3000586. doi: 10.1371/journal.pbio.3000586. - DOI - PMC - PubMed
    1. Alharbi AF, Sheng N, Nicol K, Strömberg N, Hollox EJ. Balancing selection at the human salivary agglutinin gene (DMBT1) driven by host-microbe interactions. IScience. 2022;25:104189. doi: 10.1016/j.isci.2022.104189. - DOI - PMC - PubMed

Publication types