Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Mar 1;10(3):939-955.
doi: 10.1093/gbe/evy054.

Signatures of Long-Term Balancing Selection in Human Genomes

Affiliations

Signatures of Long-Term Balancing Selection in Human Genomes

Bárbara D Bitarello et al. Genome Biol Evol. .

Abstract

Balancing selection maintains advantageous diversity in populations through various mechanisms. Although extensively explored from a theoretical perspective, an empirical understanding of its prevalence and targets lags behind our knowledge of positive selection. Here, we describe the Non-central Deviation (NCD), a simple yet powerful statistic to detect long-term balancing selection (LTBS) that quantifies how close frequencies are to expectations under LTBS, and provides the basis for a neutrality test. NCD can be applied to a single locus or genomic data, and can be implemented considering only polymorphisms (NCD1) or also considering fixed differences with respect to an outgroup (NCD2) species. Incorporating fixed differences improves power, and NCD2 has higher power to detect LTBS in humans under different frequencies of the balanced allele(s) than other available methods. Applied to genome-wide data from African and European human populations, in both cases using chimpanzee as an outgroup, NCD2 shows that, albeit not prevalent, LTBS affects a sizable portion of the genome: ∼0.6% of analyzed genomic windows and 0.8% of analyzed positions. Significant windows (P < 0.0001) contain 1.6% of SNPs in the genome, which disproportionally fall within exons and change protein sequence, but are not enriched in putatively regulatory sites. These windows overlap ∼8% of the protein-coding genes, and these have larger number of transcripts than expected by chance even after controlling for gene length. Our catalog includes known targets of LTBS but a majority of them (90%) are novel. As expected, immune-related genes are among those with the strongest signatures, although most candidates are involved in other biological functions, suggesting that LTBS potentially influences diverse human phenotypes.

PubMed Disclaimer

Figures

<sc>Fig</sc>. 1.
Fig. 1.
—A schematic representation of site frequency spectra (SFS) under neutrality and selection, which motivates the NCD statistic. (A) Unfolded SFS (ranging from 0 to 1) of derived allele frequencies (DAF) for loci under neutrality (gray) or containing one site under balancing selection with frequency equilibrium (feq) of 0.5 (blue), 0.4 (orange), and 0.3 (pink). (B) Folded SFS (ranging from 0 to 0.5) for minor allele frequencies (MAF). Colors as in A. (C) Distribution of NCD expected under neutrality (gray) and under selection assuming tf = feq. Colors as in A. x axis shows minimum and maximum values that NCD can have for a given tf.
<sc>Fig</sc>. 2.
Fig. 2.
—Power to detect balancing selection for NCD2(0.5) and other tests. The ROC curves summarize the true positive rate (TPR) as a function of the false positive rate (FPR) to detect LTBS for simulations where the balanced polymorphism was modeled to achieve feq of (A) 0.3, (B) 0.4, and (C) 0.5. Plotted values are for the African demography, Tbs = 5 Ma. L = 3 kb, except for T1 and T2 where L = 100 ISs (see Methods). BETA refers to the ß statistic (Siewert and Voight 2017). For NCD2 calculations, tf = feq. European demography yields similar results (supplementary fig. S10, Supplementary Material online). Power for NCD1, NCD1 + HKA, and T1 is provided in supplementary table S1, Supplementary Material online.
<sc>Fig</sc>. 3.
Fig. 3.
—Polymorphism-to-divergence and SFS. (A and B) SNPs/(FDs + 1) for LWK (A) and GBR (B) populations. SNPs/(FDs + 1) measures the proportion of polymorphic-to-divergent sites for the union of significant windows for all tf (purple, green) compared with all scanned windows (gray). (C and D) SFS in LWK (C) and GBR (D) of all scanned windows in chr1 (gray), significant windows for NCD2(0.5) (blue), NCD2(0.4) (orange), NCD2(0.3) (pink). DAF, derived allele frequency.
<sc>Fig</sc>. 4.
Fig. 4.
—Manhattan plot and population sharing. (A) Manhattan plot of all scanned windows, for one analysis (NCD2(0.5) for LWK). y-axis, P value (log-scale) based on Ztf-IS. x-axis, ordered location of analyzed windows on the genome. Each point is a scanned (gray and black), significant (blue), or outlier (pink) window. Names of outlier protein-coding genes are provided, sorted by name. Significant windows were defined based of simulations, not on Ztf-IS. (Ztf-IS is used to rank even those with P < 0.0001) (B) Venn diagram showing the overlap in signatures of the 167 outlier genes annotated in (A) with other populations.
<sc>Fig</sc>. 5.
Fig. 5.
—Enrichment of classes of sites among candidate windows. Dashed lines mark the P = 0.975 (bottom) and P = 0.025 (top) thresholds for the one-tailed P values (hypothesis: enrichment). NSyn, nonsynonymous; all, Genic + Intergenic + Regulatory. The annotation is based on Ensembl variant predictor (supplementary information S4, Supplementary Material online). P< 0.001 was treated as 0.001 to avoid infinite values.

Similar articles

Cited by

References

    1. Abecasis GR. 2012. An integrated map of genetic variation from 1,092 human genomes. Nature 491(7422):56–65. - PMC - PubMed
    1. Alkan C, et al. , 2009. Personalized copy number and segmental duplication maps using next-generation sequencing. Nat Genet. 41(10):1061–1067. - PMC - PubMed
    1. Alonso S, Lopez S, Izagirre N, de la Rua C.. 2008. Overdominance in the human genome and olfactory receptor activity. Mol Biol Evol. 25(5):997–1001. - PubMed
    1. Anders S, Huber W.. 2010. Differential expression analysis for sequence count data. Genome Biol. 11(10):R106.. - PMC - PubMed
    1. Andrés AM. 2011. Balancing selection in the human genome. In Encyclopedia of Life Sciences Wiley. eLS 1–8.

Publication types