Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2006 Mar;4(3):e72.
doi: 10.1371/journal.pbio.0040072. Epub 2006 Mar 7.

A map of recent positive selection in the human genome

Affiliations

A map of recent positive selection in the human genome

Benjamin F Voight et al. PLoS Biol. 2006 Mar.

Erratum in

  • PLoS Biol. 2006 Apr;4(4):e154
  • PLoS Biol. 2007 Jun;5(6):e147

Abstract

The identification of signals of very recent positive selection provides information about the adaptation of modern humans to local conditions. We report here on a genome-wide scan for signals of very recent positive selection in favor of variants that have not yet reached fixation. We describe a new analytical method for scanning single nucleotide polymorphism (SNP) data for signals of recent selection, and apply this to data from the International HapMap Project. In all three continental groups we find widespread signals of recent positive selection. Most signals are region-specific, though a significant excess are shared across groups. Contrary to some earlier low resolution studies that suggested a paucity of recent selection in sub-Saharan Africans, we find that by some measures our strongest signals of selection are from the Yoruba population. Finally, since these signals indicate the existence of genetic variants that have substantially different fitnesses, they must indicate loci that are the source of significant phenotypic variation. Though the relevant phenotypes are generally not known, such loci should be of particular interest in mapping studies of complex traits. For this purpose we have developed a set of SNPs that can be used to tag the strongest approximately 250 signals of recent selection in each population.

PubMed Disclaimer

Figures

Figure 1
Figure 1. Decay of EHH in Simulated Data for an Allele at Frequency 0.5
(A) Decay of haplotypes in a single region in which a new selected allele (red, center column) is sweeping to fixation, replacing the ancestral allele (blue). Horizontal lines are haplotypes; SNP positions are marked below the haplotype plot using blue for SNPs with intermediate allele frequencies (minor allele >0.2), and red otherwise. For a given SNP, adjacent haplotypes with the same color carry identical genotypes everywhere between that SNP and the central (selected) site. The left- and right-hand sides are sorted separately. Haplotypes are no longer plotted beyond the points at which they become unique. (B) Decay of haplotype homozygosity for ten replicate simulations. When the core SNP is neutral (s = 0; left side) the haplotype homozygosity decays at similar rates for both ancestral and derived alleles. When the derived alleles are favored (s = 2Ns = 250; right side), the haplotype homozygosity decays much slower for the derived alleles than for the ancestral alleles. The discrepancy in the overall areas spanned by these two curves forms the basis of our text for selection (iHS).
Figure 2
Figure 2. Power to Detect Sweeps-in-Progress at a p-Value of 0.01, Using Various Statistics
Simulation parameters are matched to the Yoruba data, with s = 150. Tests are abased on 51-SNP windows centered on a selected site. The upper curves (iHS) are based on counting the number of SNPs in the window for which |iHS| >2. The green line indicates power when the actual SNP under selection is excluded from the analysis. The lower lines plot power using Fay and Wu's H, and Tajima's D, both calculated using the ascertained genotype data. The line marked t = +200 indicates the power 200 generations after fixation (Ne = 104). Critical values for each statistic at p = 0.01 were obtained using identical simulations with s = 0.
Figure 3
Figure 3. Plots of Chromosome 2 SNPs with Extreme iHS Values Indicate Discrete Clusters of Signals
SNPs with |iHS| >2.5 (top 1%) are plotted. The bottom plot combines signals for all three populations, plotting only SNPs with derived frequency >0.5 and iHS <−2.5. Such SNPs correspond to high-frequency-derived SNPs in the range for which our test is most powerful. The short vertical bars below each plot indicate 100-kb windows whose signals are in the top 1% of windows genome-wide.
Figure 4
Figure 4. Central 99% Range of Unstandardized iHS for SNPs in the Yoruba Data and for SNPs in Matched Neutral Simulations
The upper and lower lines mark the boundaries of the central 99% distribution of the unstandardized iHS ratio, as a function of derived allele frequency. The gray lines plot results for a range of plausible demographic models. The fatter tails in the real data are consistent with the action of selection.
Figure 5
Figure 5. Strong Correlation between iHS and Hasc for the Yoruba Data
The left-hand plot shows the probability that a 51-SNP window centered on a given SNP is in the lowest 1% of the empirical distribution for Fay and Wu's H in neutral simulated data, and in the Yoruba data overall. Notice that in neutral simulations, there is essentially no correlation between iHS and H. Right-hand plot: In contrast, in simulations with selection (cyan line, s = 100) there is a big increase in the rate of significant H values for high-frequency selected alleles with strongly negative iHS (<−2.5). The same pattern is seen for the real data (red line). In the real data, sites with strongly positive iHS (>2.5) show an increase in the rate of positive H scores at low derived allele frequencies (magenta line). The latter probably reflects instances of an ancestral allele hitchhiking to high frequency with a selected sweep.
Figure 6
Figure 6. Signals of Selection for Three Candidate Selection Regions Discussed in the Text
The columns show (left) scatter plots of negative iHS scores, (center) haplotype plots, and (right) decay of haplotype homozygosity. In each case the core SNP for the center and right-hand plots was chosen as a SNP with high negative iHS score (starred in the scatter plots); the allele marked in red is derived. For each signal, values are listed for the derived allele frequency (pd) and the local deCode recombination rate estimate.
Figure 7
Figure 7. Sharing of iHS Signals between Populations
The numbers listed inside circles represent the numbers of 100-kb windows that are in the top 1% of the empirical distributions in at least one population. The numbers in the intersection regions are in the top 1% for one population, and the top 5% for one or both of the other populations. The counts that would be expected if signals were independent across populations are shown in parentheses. The number of windows not in any circle is reported in the upper-left corner.

Comment in

References

    1. Diamond J. Evolution, consequences and future of plant and animal domestication. Nature. 2002;418:700–707. - PubMed
    1. Jobling MA, Hurles ME, Tyler-Smith C. Human evolutionary genetics: Origins, peoples and disease. New York: Garland Science; 2004. 523 pp.
    1. Tishkoff SA, Varkonyi R, Cahinhinan N, Abbes S, Argyropoulos G, et al. Haplotype diversity and linkage disequilibrium at human G6PD: Recent origin of alleles that confer malarial resistance. Science. 2001;293:455–462. - PubMed
    1. Hamblin MT, Thompson EE, Di Rienzo A. Complex signatures of natural selection at the Duffy blood group locus. Am J Hum Genet. 2002;70:369–383. - PMC - PubMed
    1. Sabeti PC, Reich DE, Higgins JM, Levine HZP, Richter DJ, et al. Detecting recent positive selection in the human genome from haplotype structure. Nature. 2002;419:832–837. - PubMed

Publication types