Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Feb;196(2):509-22.
doi: 10.1534/genetics.113.158220. Epub 2013 Dec 6.

Identifying signatures of selection in genetic time series

Affiliations

Identifying signatures of selection in genetic time series

Alison F Feder et al. Genetics. 2014 Feb.

Erratum in

Abstract

Both genetic drift and natural selection cause the frequencies of alleles in a population to vary over time. Discriminating between these two evolutionary forces, based on a time series of samples from a population, remains an outstanding problem with increasing relevance to modern data sets. Even in the idealized situation when the sampled locus is independent of all other loci, this problem is difficult to solve, especially when the size of the population from which the samples are drawn is unknown. A standard χ(2)-based likelihood-ratio test was previously proposed to address this problem. Here we show that the χ(2)-test of selection substantially underestimates the probability of type I error, leading to more false positives than indicated by its P-value, especially at stringent P-values. We introduce two methods to correct this bias. The empirical likelihood-ratio test (ELRT) rejects neutrality when the likelihood-ratio statistic falls in the tail of the empirical distribution obtained under the most likely neutral population size. The frequency increment test (FIT) rejects neutrality if the distribution of normalized allele-frequency increments exhibits a mean that deviates significantly from zero. We characterize the statistical power of these two tests for selection, and we apply them to three experimental data sets. We demonstrate that both ELRT and FIT have power to detect selection in practical parameter regimes, such as those encountered in microbial evolution experiments. Our analysis applies to a single diallelic locus, assumed independent of all other loci, which is most relevant to full-genome selection scans in sexual organisms, and also to evolution experiments in asexual organisms as long as clonal interference is weak. Different techniques will be required to detect selection in time series of cosegregating linked loci.

Keywords: Wright–Fisher model; population genetics; tests for selection.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Distributions of test statistics under the neutral null hypothesis. Histograms show the probabilities that the value of a test statistic generated under the neutral null hypothesis falls within each vigintile (quantiles of size 0.05) of another, approximate, distribution. If the approximate distribution is close to the true distribution, the probability for each bin will approximately equal 0.05 (dashed line). The left three panels show the probability distributions for the likelihood-ratio statistic (LRS) to fall into the vigintiles of the χ2-distribution with 1 d.f., the LRS distribution under the true N, and the empirical LRS distribution under Ň, respectively. The LRS falls in the top vigintiles of the χ2-distribution more often than expected, indicating that the P-value given by the χ2-distribution underestimates the probability of a type I error. The distribution of LRS under the true N is shown as a control case. The distribution of LRS under Ň closely approximates the true LRS distribution. The rightmost panel shows the probabilities for the frequency increment statistic (FIS) to fall into each vigintile of Student’s t-distribution with L − 1 d.f. Student’s t is a good approximation for the true distribution of the FIS. Parameter values were N = 103, T = 100, Δ = 20, L = 5, and ν0 = 0.5; the number of Wright–Fisher simulations was 3.5 × 105.
Figure 2
Figure 2
Power of the ELRT and the FIT to detect selection of different strength. Power is reported as the fraction of trial data sets generated by the Wright–Fisher model with selection for which the ELRT (left column) or the FIT (right column) rejects the neutral null hypothesis at P-value α = 0.05 in short (T = 0.01N, top row) and “long” (T = 0.1N, bottom row) time series. Both tests gain power with increasing selection pressure, but in long time series they start to lose power when selection becomes very strong (see text for details). Power of both tests grows weakly with the number of sampled time points, L. We ran 103 trials with N = 104 and initial allele frequency ν0 = 0.5. Trials that produced absorption events within the sampling period were discarded.
Figure 3
Figure 3
Schematic diagram describing the power of any test for selection in allele-frequency time-series data. Thick solid lines show the expected frequency dynamics (Equation 3) of alleles with selection coefficients s = 0.001, 0.005, 0.01, initiated at frequency x0 = 0.05. Shaded areas denote ±σ2(t,x0)/N, where σ2 is given by Equation 6 and N = 104, which illustrate the size of stochastic fluctuations around the expected frequency. Vertical dashed shaded lines show hypothetical sampling time points. When the selection coefficient is low (Ns = 10), stochastic fluctuations dominate, and tests of selection have low power. When the selection coefficient is high (Ns = 100), fixation events occur within the sampling interval and some sampling points (at 800 and 1000 generations) become uninformative, which also leads to loss of power. For a given sampling interval T power is maximized for intermediate selection coefficients (Ns = 50).
Figure 4
Figure 4
Distributions of test statistics under the neutral null hypothesis, when allele frequencies are sampled with noise. Histograms show the probabilities that the value of a test statistic generated under the neutral null hypothesis falls within each of the vigintiles (quantiles of size 0.05) of another, approximate, distribution. Notations are as in Figure 1. Parameter values were N = 103, T = 100, Δ = 20, L = 5, ν0 = 0.5, and n = 500; the number of Wright–Fisher simulations was 2 × 105.
Figure 5
Figure 5
Power of the ELRT and the FIT to detect selection of different strengths, under various sampling regimes. Parameter values were N = 104, T = 1000, Δ = 100, L = 10, and ν0 = 0.5; the number of Wright–Fisher simulations was 103.
Figure 6
Figure 6
Application of the ELRT and the FIT to allele-frequency time series from Lang et al. (2011, 2013). Each panel shows the estimated frequency of a mutant allele in the long-term evolution lines described in Lang et al. (2011, 2013): the left panel shows the frequency of mutation D579Y in gene STE11 in population RMB2-F01; the center panel shows the frequency of mutation Y822* in gene IRA1 in population RMS1-D12; and the right panel shows the frequency of mutation A2698T in gene IRA2 in population BYS2-D06. Shading highlights the data points for which the FIT and the ELRT identify selection.

References

    1. Barrett R. D. H., Rogers S. M., Schluter D., 2008. Natural selection on a major armor gene in threespine stickleback. Science 322: 255–257. - PubMed
    1. Barrick J. E., Yu D. S., Yoon S. H., Jeong H., Oh T. K., et al. , 2009. Genome evolution and adaptation in a long-term experiment with Escherichia coli. Nature 461: 1243–1247. - PubMed
    1. Bollback J. P., Huelsenbeck J. P., 2007. Clonal interference is alleviated by high mutation rates in large populations. Mol. Biol. Evol. 24: 1397–1406. - PubMed
    1. Bollback J. P., York T. L., Nielsen R., 2008. Estimation of 2Nes from temporal allele frequency data. Genetics 179: 497–502. - PMC - PubMed
    1. Boyko A. R., Williamson S. H., Indap A. R., Degenhardt J. D., Hernandez R. D., et al. , 2008. Assessing the evolutionary impact of amino acid mutations in the human genome. PLoS Genet. 4: e1000083. - PMC - PubMed

Publication types

LinkOut - more resources