Identifying signatures of selection in genetic time series

Alison F Feder¹, Sergey Kryazhimskiy, Joshua B Plotkin

Affiliations

PMID: 24318534
PMCID: PMC3914623
DOI: 10.1534/genetics.113.158220

Identifying signatures of selection in genetic time series

Alison F Feder et al. Genetics. 2014 Feb.

. 2014 Feb;196(2):509-22.

doi: 10.1534/genetics.113.158220. Epub 2013 Dec 6.

Authors

Alison F Feder¹, Sergey Kryazhimskiy, Joshua B Plotkin

Affiliation

¹ Department of Biology, University of Pennsylvania, Philadelphia, Pennsylvania 19104.

PMID: 24318534
PMCID: PMC3914623
DOI: 10.1534/genetics.113.158220

Erratum in

Identifying Signatures of Selection in Genetic Time Series.
[No authors listed] [No authors listed] Genetics. 2018 Dec;210(4):1559. doi: 10.1534/genetics.118.301645. Genetics. 2018. PMID: 30523169 Free PMC article. No abstract available.

Abstract

Both genetic drift and natural selection cause the frequencies of alleles in a population to vary over time. Discriminating between these two evolutionary forces, based on a time series of samples from a population, remains an outstanding problem with increasing relevance to modern data sets. Even in the idealized situation when the sampled locus is independent of all other loci, this problem is difficult to solve, especially when the size of the population from which the samples are drawn is unknown. A standard χ(2)-based likelihood-ratio test was previously proposed to address this problem. Here we show that the χ(2)-test of selection substantially underestimates the probability of type I error, leading to more false positives than indicated by its P-value, especially at stringent P-values. We introduce two methods to correct this bias. The empirical likelihood-ratio test (ELRT) rejects neutrality when the likelihood-ratio statistic falls in the tail of the empirical distribution obtained under the most likely neutral population size. The frequency increment test (FIT) rejects neutrality if the distribution of normalized allele-frequency increments exhibits a mean that deviates significantly from zero. We characterize the statistical power of these two tests for selection, and we apply them to three experimental data sets. We demonstrate that both ELRT and FIT have power to detect selection in practical parameter regimes, such as those encountered in microbial evolution experiments. Our analysis applies to a single diallelic locus, assumed independent of all other loci, which is most relevant to full-genome selection scans in sexual organisms, and also to evolution experiments in asexual organisms as long as clonal interference is weak. Different techniques will be required to detect selection in time series of cosegregating linked loci.

Keywords: Wright–Fisher model; population genetics; tests for selection.

PubMed Disclaimer

Figures

**Figure 1**
Distributions of test statistics under the neutral null hypothesis. Histograms show the probabilities that the value of a test statistic generated under the neutral null hypothesis falls within each vigintile (quantiles of size 0.05) of another, approximate, distribution. If the approximate distribution is close to the true distribution, the probability for each bin will approximately equal 0.05 (dashed line). The left three panels show the probability distributions for the likelihood-ratio statistic (LRS) to fall into the vigintiles of the χ²-distribution with 1 d.f., the LRS distribution under the true N, and the empirical LRS distribution under Ň, respectively. The LRS falls in the top vigintiles of the χ²-distribution more often than expected, indicating that the P-value given by the χ²-distribution underestimates the probability of a type I error. The distribution of LRS under the true N is shown as a control case. The distribution of LRS under Ň closely approximates the true LRS distribution. The rightmost panel shows the probabilities for the frequency increment statistic (FIS) to fall into each vigintile of Student’s t-distribution with L − 1 d.f. Student’s t is a good approximation for the true distribution of the FIS. Parameter values were N = 10³, T = 100, Δ = 20, L = 5, and ν₀ = 0.5; the number of Wright–Fisher simulations was 3.5 × 10⁵.

**Figure 2**
Power of the ELRT and the FIT to detect selection of different strength. Power is reported as the fraction of trial data sets generated by the Wright–Fisher model with selection for which the ELRT (left column) or the FIT (right column) rejects the neutral null hypothesis at P-value α = 0.05 in short (T = 0.01N, top row) and “long” (T = 0.1N, bottom row) time series. Both tests gain power with increasing selection pressure, but in long time series they start to lose power when selection becomes very strong (see text for details). Power of both tests grows weakly with the number of sampled time points, L. We ran 10³ trials with N = 10⁴ and initial allele frequency ν₀ = 0.5. Trials that produced absorption events within the sampling period were discarded.

**Figure 3**
Schematic diagram describing the power of any test for selection in allele-frequency time-series data. Thick solid lines show the expected frequency dynamics (Equation 3) of alleles with selection coefficients s = 0.001, 0.005, 0.01, initiated at frequency x₀ = 0.05. Shaded areas denote $\pm \sqrt{σ^{2} (t, x_{0}) / N}$ , where σ² is given by Equation 6 and N = 10⁴, which illustrate the size of stochastic fluctuations around the expected frequency. Vertical dashed shaded lines show hypothetical sampling time points. When the selection coefficient is low (Ns = 10), stochastic fluctuations dominate, and tests of selection have low power. When the selection coefficient is high (Ns = 100), fixation events occur within the sampling interval and some sampling points (at 800 and 1000 generations) become uninformative, which also leads to loss of power. For a given sampling interval T power is maximized for intermediate selection coefficients (Ns = 50).

**Figure 4**
Distributions of test statistics under the neutral null hypothesis, when allele frequencies are sampled with noise. Histograms show the probabilities that the value of a test statistic generated under the neutral null hypothesis falls within each of the vigintiles (quantiles of size 0.05) of another, approximate, distribution. Notations are as in Figure 1. Parameter values were N = 10³, T = 100, Δ = 20, L = 5, ν₀ = 0.5, and n = 500; the number of Wright–Fisher simulations was 2 × 10⁵.

**Figure 5**
Power of the ELRT and the FIT to detect selection of different strengths, under various sampling regimes. Parameter values were N = 10⁴, T = 1000, Δ = 100, L = 10, and ν₀ = 0.5; the number of Wright–Fisher simulations was 10³.

**Figure 6**
Application of the ELRT and the FIT to allele-frequency time series from Lang *et al.* (2011, 2013). Each panel shows the estimated frequency of a mutant allele in the long-term evolution lines described in Lang *et al.* (2011, 2013): the left panel shows the frequency of mutation D579Y in gene *STE11* in population RMB2-F01; the center panel shows the frequency of mutation Y822* in gene *IRA1* in population RMS1-D12; and the right panel shows the frequency of mutation A2698T in gene *IRA2* in population BYS2-D06. Shading highlights the data points for which the FIT and the ELRT identify selection.

See this image and copyright information in PMC

References

1. Barrett R. D. H., Rogers S. M., Schluter D., 2008. Natural selection on a major armor gene in threespine stickleback. Science 322: 255–257. - PubMed
1. Barrick J. E., Yu D. S., Yoon S. H., Jeong H., Oh T. K., et al. , 2009. Genome evolution and adaptation in a long-term experiment with Escherichia coli. Nature 461: 1243–1247. - PubMed
1. Bollback J. P., Huelsenbeck J. P., 2007. Clonal interference is alleviated by high mutation rates in large populations. Mol. Biol. Evol. 24: 1397–1406. - PubMed
1. Bollback J. P., York T. L., Nielsen R., 2008. Estimation of 2Nes from temporal allele frequency data. Genetics 179: 497–502. - PMC - PubMed
1. Boyko A. R., Williamson S. H., Indap A. R., Degenhardt J. D., Hernandez R. D., et al. , 2008. Assessing the evolutionary impact of amino acid mutations in the human genome. PLoS Genet. 4: e1000083. - PMC - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Identifying signatures of selection in genetic time series

Affiliation

Identifying signatures of selection in genetic time series

Authors

Affiliation

Erratum in

Abstract

Figures

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Other Literature Sources