Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Apr;15(4):279-282.
doi: 10.1038/nmeth.4606. Epub 2018 Feb 19.

Identifying the favored mutation in a positive selective sweep

Affiliations

Identifying the favored mutation in a positive selective sweep

Ali Akbari et al. Nat Methods. 2018 Apr.

Abstract

Most approaches that capture signatures of selective sweeps in population genomics data do not identify the specific mutation favored by selection. We present iSAFE (for "integrated selection of allele favored by evolution"), a method that enables researchers to accurately pinpoint the favored mutation in a large region (∼5 Mbp) by using a statistic derived solely from population genetics signals. iSAFE does not require knowledge of demography, the phenotype under selection, or functional annotations of mutations.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Illustration and performance of the SAFE method. (a) The HAF score for haplotype h is the sum of the derived allele counts of the mutations on h. Carriers of the favored mutation have higher fraction of the total HAF score of the sample (high ϕ), and lower number of distinct haplotypes compared to non-carriers (low κ). (b) Schematic of a no-recombination (for exposition purposes) genealogy under a selective sweep. The mutations can be categorized as ‘non-carrier’ (gray), ‘ancestral to favored’ (turquoise) arising prior to the favored mutation, and ‘descendant to favored’ (blue) that arise on haplotypes carrying the favored mutations but after the favored mutation, and the favored mutation itself (red). In the right panels, simulations showing ϕ versus κ values for each variant in a neutral evolution and a selective sweep for 1000 simulations with favored allele frequency (ν0 = 0.5) and default values for other simulation parameters (see Online Methods). The joint-distribution of ϕ and κ, in a selective sweep, changes in a dramatic but predictable manner that separates out non-carrier (gray), descendant (blue), and ancestral (turquoise) mutations from the favored (red) mutations. The SAFE score computes a normalized difference of the two statistics, ϕ and κ. (c) Performance (favored mutation rank) of SAFE compared to iHS and SCCT on 50 kbp windows with 1000 simulations per frequency bin. The simulations were performed with default parameter values (see Online Methods) for a fixed population size with ongoing selective sweeps. The left panel combines all allele frequencies while the right panel shows median and mean ranks for replicates divided into four bins.
Figure 2
Figure 2
Illustration of the iSAFE method. (a) The red-star, turquoise-triangle, and blue-square denote the favored, ancestral, and descendant mutations, respectively. As different windows have different genealogies due to recombination, the SAFE-score of a non-favored mutation e is relatively low when inserted in other windows. In contrast, the SAFE-score of the favored mutation is likely to dominate other mutations (Supplementary Note 1). (b) The Ψe,w matrix for a 5 Mbp region around LCT gene in FIN population shows that the ‘shoulder’ of selection can extend for a few Mbp. The blue circle shows the location of the putative favored mutation rs4988235. (c) SAFE and iSAFE performance (rank distribution of favored mutation) as a function of window size with 1000 simulations per bin. The dashed (dotted) line represents median (quartile), and decays for large windows while iSAFE is robust to increase in window size.
Figure 3
Figure 3
iSAFE performance. (a) The top left (right) panel is the Cumulative Distribution Function (CDF) of favored mutation rank (peak distance) for iSAFE and CMS scores. The lower panel shows the iSAFE performance (rank and peak distance distributions of favored mutation) as a function of favored allele frequency (ν ) in the target population (EUR). The dashed (dotted) line represents median (quartiles). All data is based on 1000 simulations of 5 Mbp genomic regions simulated using a model of human genome based on the human demography (Supplementary Fig. 14). The time of onset of selection was chosen at random (using the distribution in Supplementary Fig. 14) after the out of Africa event, in the lineage of EUR population (as the target population). When the onset of selection is before split of EUR and EAS (>23kya), both (EUR and EAS) are under selection. (b) iSAFE and CMS scores (top and bottom panels, respectively) on 4 well-characterized selective sweeps (Supplementary Fig 8; Supplementary Table 1). The rank of the putative favored mutation (red star) in 5 Mbp region is shown in top left corner. (c) iSAFE-scores on regions under selection. Top ranked iSAFE candidates are marked by blue squares when they match putative favored mutations, while turquoise circles represent new favored mutations suggested by iSAFE. All data-sets were chosen by taking a 5 Mbp window around the putative selected region, unless one side reached the telomere or centromere. (d) The GRM5-TYR region. The mutation rs672144 is ranked first by iSAFE and very well separated from rest of the mutations in 5 Mbp around it, in all non-African populations with high confidence (iSAFE > 0.5, P ≪ 1.3e-8; Supplementary Fig. 10). The upper panel is haplotype plot with core mutation rs672144 on all 5008 haplotypes (2504 samples) of 1000GP. This plot shows carrier haplotypes of mutation rs672144 are conserved over a longer span than haplotypes in non-carriers which is a signal of selection. Lower panel shows global frequencies of carrier haplotypes of mutation rs672144 (red, blue) and non-carrier haplotypes (gray). The evidence is consistent with an out of Africa selection on standing variation (soft sweep) with mutation rs672144 as the favored variant.

References

    1. Vitti JJ, Grossman SR & Sabeti PC Annual review of genetics 47, 97–120 (2013). - PubMed
    1. Fan S, Hansen MEB, Lo Y & Tishkoff SA Science 354, 54–59 (2016). - PMC - PubMed
    1. Schrider DR, Mendes FK, Hahn MW & Kern AD Genetics 200, 267–284 (2015). - PMC - PubMed
    1. Field Y et al. Science 354, 760–764 (2016). - PMC - PubMed
    1. Grossman SR et al. Science 327, 883–886 (2010). - PubMed

Publication types