Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2010 Jan 12;365(1537):185-205.
doi: 10.1098/rstb.2009.0219.

Genome-wide scans for footprints of natural selection

Affiliations
Review

Genome-wide scans for footprints of natural selection

Taras K Oleksyk et al. Philos Trans R Soc Lond B Biol Sci. .

Abstract

Detecting recent selected 'genomic footprints' applies directly to the discovery of disease genes and in the imputation of the formative events that molded modern population genetic structure. The imprints of historic selection/adaptation episodes left in human and animal genomes allow one to interpret modern and ancestral gene origins and modifications. Current approaches to reveal selected regions applied in genome-wide selection scans (GWSSs) fall into eight principal categories: (I) phylogenetic footprinting, (II) detecting increased rates of functional mutations, (III) evaluating divergence versus polymorphism, (IV) detecting extended segments of linkage disequilibrium, (V) evaluating local reduction in genetic variation, (VI) detecting changes in the shape of the frequency distribution (spectrum) of genetic variation, (VII) assessing differentiating between populations (F(ST)), and (VIII) detecting excess or decrease in admixture contribution from one population. Here, we review and compare these approaches using available human genome-wide datasets to provide independent verification (or not) of regions found by different methods and using different populations. The lessons learned from GWSSs will be applied to identify genome signatures of historic selective pressures on genes and gene regions in other species with emerging genome sequences. This would offer considerable potential for genome annotation in functional, developmental and evolutionary contexts.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Strategies for detection of the genome-wide selection signatures in table 1. Consider a small gene region that displays SNP variation at 17 adjacent sites (vertical columns in all panels). (a) Eight individuals in species 1 (human) carry alternative white and green alleles (synonymous variants) and also a codon-altering non-synonymous allele (red and white). A related species (chimpanzee), examined at the same SNP sites, displays a divergence pattern from the index (human species); positive selection of one SNP allele alters the random distribution pattern when examining non-synonymous alleles only (red and white). Graphs on right plot departure of genome-wide average for parameter (measured by the seven selection tests described in table 1). (a) Comparing sequence divergence between species (table 1, I–III). Gene regions with past actions of selection show an altered sequence organization that can be revealed by comparing changes between homologous sequences by three different approaches. (I) Phylogenetic shadowing: comparing divergence of orthologous sequences across the genome. The genome segments with low divergence between species compared with the genome-wide averages can indicate purifying selection or positive selection. (II) Increased function-altering mutation rates: comparing the ratio of non-synonymous (dN: left panel; changes indicated in red) to synonymous changes (dS: right panel; changes in green). This comparison could be accomplished by (i) comparing the dN/dS ratio between the candidate gene of interest and the genome-wide average for other genes and (ii) comparing diversity with divergence ratio for dN versus dS for homologous sequences. (III) Interspecies divergence versus intraspecies polymorphism: comparing intraspecific divergence (e.g. between chimpanzee and human) with interspecific polymorphism (within the human species). Selection decreases variation within an affected species (dark orange), and the scope of this decrease can be assessed by contrasting with divergence between species sequences (light orange) unaffected by the species-specific adaptation. (b) Comparing sequence variation patterns within a species (table 1, IV–VIII). Positive selection results in an elevated frequency of haplotypes carrying the advantageous allele at the expense of the others in the process called ‘selective sweep’ (Maynard Smith & Haigh 1974), followed by the gradual incorporation of derived variation seen as a skewed ‘frequency spectrum’. These signatures can all be revealed by comparing sequences within or between populations of the same species. Five tests (described in table 1) include: (IV) Local reduction in genetic variation: comparison of levels of polymorphism in and around the selected locus to the estimated neutral expectation or to the genome-wide averages (left panel; ancestral alleles are in blue or light blue). (V) Changes in the shape of the frequency distribution: identifying an excess of derived alleles, low-frequency polymorphic sites or singletons. Generations after the selective sweep, new (derived) mutations (yellow) are slowly introduced back into the recently selected region, and most appear at low frequencies expected under mutation/drift equilibrium, resulting in a skewed frequency distribution (spectrum) of polymorphisms (left panel). (VI) Differentiating between populations: identifying regions of unusually high population divergence. Local reduction of genomic variation in a selected population (left panel, middle) results in a local increase in genomic differentiation between sequences (unaffected population is not shown in the figure but can be approximated by the population before selection: left panel, top). Comparisons can be made for levels of differentiation calculated as FST around the selected loci to the neutral expectations, to a set of neutral loci or to the genome-wide averages. (VII) Extended LD segments: comparing the relative length and frequency of selected haplotypes. Positive selection results in an elevated frequency of haplotypes carrying the advantageous allele at the expense of the others. Owing to the generations of recombination, long haplotypes are also rare. However, selection sweep creates haplotypes that are both long and frequent in a population (red and light red: right panel, middle and bottom). These methods are used to identify relatively recent and incomplete sweeps. (VIII) Elevated admixture contribution from one population: identifying sections of the genome with unusually high or low ancestry in a mixed population using MALD. Similar to VII, when two populations meet, one may carry a beneficial allele that can be later detected as a regional increase in ancestry, using a genome-wide map of highly differentiating population markers, and evaluated against the genome-wide expectation. I–VIII: blue line, genome-wide average.
Figure 2.
Figure 2.
Increased number of function-altering mutations indicates a positively selected domain in TRIM5α protein that mediates retroviral restriction (signature II). The tight clustering of humans versus rhesus non-synonymous changes in TRIM5α gene indicates a SPRY domain subjected to positive selection with an average dN/dS ratio of greater than four (Sawyer et al. 2005).
Figure 3.
Figure 3.
Reduced diversity to divergence ratio around the selected 5′ NTR variant of Tb1 gene found in maize that causes the plant to carry ears instead of tassels (signature III). In the process of domestication, the 5′ NTR lost its variation, compared with the wild teosinte and the domesticated maize (Wang et al. 1999). Consistent with the selection hypothesis, the sliding window shows low polymorphism, but a high diversity in the region, evaluated as a signature of positive selection by the HKA test (Hudson et al. 1987). Yellow lines, maize; green lines, teosinte.
Figure 4.
Figure 4.
Reduced polymorphism around the SLC24A5 gene involved in skin pigmentation indicates an episode of selection in the European population (signature IV). A region of decreased heterozygosity in Europeans (CEU) compared with Nigerian Yoruba (YRI), Chinese (CHB) and Japanese (JPT) people on chromosome 15 near the SLC24A5 gene is significant when (a) compared across the genome in CEU samples and (b) plotted as averages in 10 kb intervals in the 300 kb vicinity of the gene, with heterozygosity for four HapMap populations (Lamason et al. 2005). Black lines, YRI; green lines, CHB; blue lines, JPT; orange lines, CEU.
Figure 5.
Figure 5.
Example of a skewed frequency spectrum in the human CLSPN gene region indicating a positive selection signature in Europeans but not in Africans (signature V). A shift in frequency spectrum in the recently selected region is caused by the emergence of new low-frequency mutations. (a) Tajima's D values plotted across the CLSPN CRTR from the UCSC genome browser shows a region of negative values consistent with the sweep seen in (b), the visual genotype in the ED population adopted from Carlson et al. (2005). Each row corresponds to an individual, and each column corresponds to a polymorphic site in a visual genotype for 1.5 Mbp spanning the CLSPN CRTR in the Perlegen data. Common allele homozygotes are shown in blue, heterozygotes are shown in red, rare allele homozygotes are shown in yellow and missing data are shown in grey. The top 24 samples are African (AD); the bottom 23 samples are of European descent (ED). ED samples show much less variation, most of which comes as singleton mutations.
Figure 6.
Figure 6.
High population differentiation in IL4, a cytokine involved in immunity, may be attributed to positive selection (signature VI): a non-neutral pattern of differentiation at the IL4 gene is demonstrated by evaluating the FST value at the IL4 −524 locus against the same measure in a set of neutral loci elsewhere in the genome: (a) FST at −524 is higher, compared with 17 out of 18 neutral markers in a global distribution. (b) Pairwise FST at −524 between loci from China and India populations is dramatically elevated (adapted from Rockman et al. 2003).
Figure 7.
Figure 7.
Unusual pattern of LD surrounding alleles indicates recent independent adaptations for post-adolescence lactase persistence: (a) LCT-C –14010 in Africans (red) and (b) LCT-T –13910 (green) in Eurasians (signature VII). Haplotypes, shown for each individual as parallel lines, are extended around the recently selected alleles, while the alternative alleles are enclosed by relatively short LD segments. In this example, haplotypes that surround lactase persistence (red and green) in Eurasians are much longer than the haplotypes that contain the alternative alleles (blue and orange). While the lactase-persistence alleles are different in the two populations, both are found in high frequencies and located on unusually long haplotypes (Tishkoff et al. 2007).
Figure 8.
Figure 8.
An excess of African and deficiency of European ancestry, as identified by admixture mapping (MALD) in Puerto Ricans, is evident in the region encompassing the HLA superlocus that contains diverse antigens essential in human immune function (signature VIII). Deviation in admixture proportion from three founder populations (African, European and Amerindian are represented by red, green and blue curves, respectively) is plotted along the physical location on chromosome 6 of Puerto Ricans. The y-axis indicates the excess/deficiency in ancestry at the corresponding SNP, averaged for 192 individuals (Tang et al. 2007). Orange lines, African; green lines, Native American; blue lines, European.

References

    1. Akey J. M., Zhang G., Zhang K., Jin L., Shriver M. D.2002Interrogating a high-density SNP map for signatures of natural selection. Genome Res. 12, 1805–1814 (doi:10.1101/gr.631202) - DOI - PMC - PubMed
    1. Akey J. M., Swanson W. J., Madeoy J., Eberle M., Shriver M. D.2006TRPV6 exhibits unusual patterns of polymorphism and divergence in worldwide populations. Hum. Mol. Genet. 15, 2106–2113 (doi:10.1093/hmg/ddl134) - DOI - PubMed
    1. Altshuler D., Brooks L. D., Chakravarti A., Collins F. S., Daly M. J., Donnelly P.2005A haplotype map of the human genome. Nature 437, 1299–1320 - PMC - PubMed
    1. Andres A. M., Soldevila M., Navarro A., Kidd K. K., Oliva B., Bertranpetit J.2004Positive selection in MAOA gene is human exclusive: determination of the putative amino acid change selected in the human lineage. Hum. Genet. 115, 377–386 - PubMed
    1. Ayodo G., Price A. L., Keinan A., Ajwang A., Otieno M. F., Orago A. S., Patterson N., Reich D.2007Combining evidence of natural selection with association analysis increases power to detect malaria-resistance variants. Am. J. Hum. Genet. 81, 234–242 (doi:10.1086/519221) - DOI - PMC - PubMed

Publication types