Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Jul;185(3):907-22.
doi: 10.1534/genetics.110.116459. Epub 2010 Apr 20.

Searching for footprints of positive selection in whole-genome SNP data from nonequilibrium populations

Affiliations

Searching for footprints of positive selection in whole-genome SNP data from nonequilibrium populations

Pavlos Pavlidis et al. Genetics. 2010 Jul.

Abstract

A major goal of population genomics is to reconstruct the history of natural populations and to infer the neutral and selective scenarios that can explain the present-day polymorphism patterns. However, the separation between neutral and selective hypotheses has proven hard, mainly because both may predict similar patterns in the genome. This study focuses on the development of methods that can be used to distinguish neutral from selective hypotheses in equilibrium and nonequilibrium populations. These methods utilize a combination of statistics on the basis of the site frequency spectrum (SFS) and linkage disequilibrium (LD). We investigate the patterns of genetic variation along recombining chromosomes using a multitude of comparisons between neutral and selective hypotheses, such as selection or neutrality in equilibrium and nonequilibrium populations and recurrent selection models. We perform hypothesis testing using the classical P-value approach, but we also introduce methods from the machine-learning field. We demonstrate that the combination of SFS- and LD-based statistics increases the power to detect recent positive selection in populations that have experienced past demographic changes.

PubMed Disclaimer

Figures

F<sc>igure</sc> 1.—
Figure 1.—
Histogram of the ratio formula image for the following demographic scenarios. (A) a single realization of the bottleneck scenario inferred by Li and Stephan (2006). Long coalescent trees that escape the bottleneck tend to produce small ratios (<4). On the other hand, genealogies that coalesce within the bottleneck period produce star-like trees because of the recent, rapid, and severe contraction of the population. (B) A realization of the bottleneck scenario inferred by Thornton and Andolfatto (2006). In contrast to Li and Stephan (2006), coalescent events occur continuously. (C) The standard neutral model. For the Li and Stephan (2006), Thornton and Andolfatto (2006), and the neutral scenario, 12 chromosomes of 50 kb have been simulated. The recombination rate is ρ = 0.05/bp and the mutation rate θ = 0.004/bp. The parameter values for the Li and Stephan (2006) and Thornton and Andolfatto (2006) scenarios are described in the main text.
F<sc>igure</sc> 2.—
Figure 2.—
The relation between (A) ΛMAX and (B) the percentage of star-like genealogies and the number of segregating sites in the Li and Stephan (2006) demographic scenario. We have performed neutral simulations for 12 recombining chromosomes, assuming a length of 50 kb. The recombination rate ρ = 0.05/bp and the mutation rate θ = 0.005/bp. The parameter values for the demographic model inferred by Li and Stephan (2006) are described in the main text. The number of short genealogies in the Li and Stephan (2006) scenario determines both the number of segregating sites and the sweep resemblance (measured by the SweepFinder statistic). When a genomic region is dominated by short star-like genealogies only a few segregating sites are present. Even if this constitutes a polymorphism valley, the pattern does not look like a single sweep because of a lack of the high-frequency derived variants (Kim and Stephan 2002). Similarly, when the star-like trees are absent ΛMAX is small. On the other hand, the simultaneous presence of star-like and long genealogies creates sweep-like patterns. This is because star-like trees tend to cluster together along the recombining chromosome, creating valleys within polymorphism islands.
F<sc>igure</sc> 3.—
Figure 3.—
A selective sweep causes a spatial modification of the SFS. The mean and the variance of the frequency are modified when a selective sweep has occurred in the middle of a 50-kb genomic fragment. The 50-kb region is split in 2-kb nonoverlapping windows and in each one the average mean (fi) (A and C) and the variance var(fi) (B and D) of the frequency fi of the polymorphism class i is calculated. In A the plots refer to a selective event in equilibrium populations (α = 2500) that has been completed recently, whereas in C, the plots refer to the nonequilibrium model of Thornton and Andolfatto (2006) (α = 2500). The solid lines refer to the singletons, the dashed lines to the class 11, and the gray lines to the classes 2–10. The dramatic change of the high-frequency derived alleles in A contributes to the precise localization of the selective event. On the contrary, in C the high-frequency-derived SNPs are absent even in the proximity of the selective sweep. This is because the length of the branches of the coalescent tree that may generate high-frequency-derived variants are very small due to the simultaneous action of the sweep and the bottleneck. Therefore, the observed polymorphisms (mostly singletons) are younger than the selective event and spread over the whole genomic region, obscuring the location of the selective sweep.
F<sc>igure</sc> 4.—
Figure 4.—
The joint distributions of ΛMAX and ωMAX in scenarios with and without selection. (A) We compare the joint distribution of ΛMAX and ωMAX between a model with selection (α = 500) in a constant population and a standard neutral model. The overlap between the distributions is limited and the scenarios can be discriminated by the SweepFinder (y-axis) and to a lesser extent by the ω-statistic (x-axis). (B) We compare a model with selection (α = 500) with a neutral model that has experienced a bottleneck as it has been inferred by Li and Stephan (2006). Neither of the statistics can discriminate accurately the two scenarios (see also Table 2). Note that the scales of the statistics are different in A and B.
F<sc>igure</sc> 5.—
Figure 5.—
The distributions of ΛMAX for various levels of the decrease of heterozygosity and s = 10−2. Each distribution is discrete and the size of each bin has been set to 6. (A) For formula image, 0.5, and 0.95 the cutoff values (95th percentile) are 5.7, 9.7, and 11.9, respectively, and the sensitivities of the test (percentage of true positives) given the cutoff values are 0.74, 0.48, and 0.07. The power of SweepFinder is greater for the Li and Stephan (2006) and Jensen et al. (2008) estimations than those of Macpherson et al. (2007) and Andolfatto (2007) because selection is strong (s = 10−2). (B) When s = 10−4 the amount of diversity is similar for formula image, 0.5, and 0.95. Therefore, the performance of SweepFinder is relatively independent of the formula image.

Similar articles

Cited by

References

    1. Akey, J. M., 2009. Constructing genomic maps of positive selection in humans: Where do we go from here? Genome Res. 19 711–722. - PMC - PubMed
    1. Akey, J. M., M. A. Eberle, M. J. Rieder, C. S. Carlson, M. D. Shriver et al., 2004. Population history and natural selection shape patterns of genetic variation in 132 genes. PLoS Biol. 2 e286. - PMC - PubMed
    1. Andolfatto, P., 2007. Hitchhiking effects of recurrent beneficial amino acid substitutions in the Drosophila melanogaster genome. Genome Res. 17 1755–1762. - PMC - PubMed
    1. Barton, N., 1998. The effect of hitch-hiking on neutral genealogies. Genet. Res. 72 123–133.
    1. Beisswanger, S., and W. Stephan, 2008. Evidence that strong positive selection drives neofunctionalization in the tandemly duplicated polyhomeotic genes in Drosophila. Proc. Natl. Acad. Sci. USA 105 5447–5452. - PMC - PubMed

Publication types