. 2016 Oct;204(2):723-735.

doi: 10.1534/genetics.116.191197. Epub 2016 Aug 19.

Estimating the Effective Population Size from Temporal Allele Frequency Changes in Experimental Evolution

Ágnes Jónás¹, Thomas Taus¹, Carolin Kosiol², Christian Schlötterer², Andreas Futschik³

Affiliations

¹ *Vienna Graduate School of Population Genetics, 1210 Vienna, Austria †Institut für Populationsgenetik, Vetmeduni Vienna, 1210 Vienna, Austria.
² †Institut für Populationsgenetik, Vetmeduni Vienna, 1210 Vienna, Austria.
³ †Institut für Populationsgenetik, Vetmeduni Vienna, 1210 Vienna, Austria ‡Department of Applied Statistics, Johannes Kepler Universität Linz, 4040 Linz, Austria andreas.futschik@jku.at.

PMID: 27542959
PMCID: PMC5068858
DOI: 10.1534/genetics.116.191197

Estimating the Effective Population Size from Temporal Allele Frequency Changes in Experimental Evolution

Ágnes Jónás et al. Genetics. 2016 Oct.

. 2016 Oct;204(2):723-735.

doi: 10.1534/genetics.116.191197. Epub 2016 Aug 19.

Authors

Ágnes Jónás¹, Thomas Taus¹, Carolin Kosiol², Christian Schlötterer², Andreas Futschik³

Affiliations

¹ *Vienna Graduate School of Population Genetics, 1210 Vienna, Austria †Institut für Populationsgenetik, Vetmeduni Vienna, 1210 Vienna, Austria.
² †Institut für Populationsgenetik, Vetmeduni Vienna, 1210 Vienna, Austria.
³ †Institut für Populationsgenetik, Vetmeduni Vienna, 1210 Vienna, Austria ‡Department of Applied Statistics, Johannes Kepler Universität Linz, 4040 Linz, Austria andreas.futschik@jku.at.

PMID: 27542959
PMCID: PMC5068858
DOI: 10.1534/genetics.116.191197

Abstract

The effective population size ([Formula: see text]) is a major factor determining allele frequency changes in natural and experimental populations. Temporal methods provide a powerful and simple approach to estimate short-term [Formula: see text] They use allele frequency shifts between temporal samples to calculate the standardized variance, which is directly related to [Formula: see text] Here we focus on experimental evolution studies that often rely on repeated sequencing of samples in pools (Pool-seq). Pool-seq is cost-effective and often outperforms individual-based sequencing in estimating allele frequencies, but it is associated with atypical sampling properties: Additional to sampling individuals, sequencing DNA in pools leads to a second round of sampling, which increases the variance of allele frequency estimates. We propose a new estimator of [Formula: see text] which relies on allele frequency changes in temporal data and corrects for the variance in both sampling steps. In simulations, we obtain accurate [Formula: see text] estimates, as long as the drift variance is not too small compared to the sampling and sequencing variance. In addition to genome-wide [Formula: see text] estimates, we extend our method using a recursive partitioning approach to estimate [Formula: see text] locally along the chromosome. Since the type I error is controlled, our method permits the identification of genomic regions that differ significantly in their [Formula: see text] estimates. We present an application to Pool-seq data from experimental evolution with Drosophila and provide recommendations for whole-genome data. The estimator is computationally efficient and available as an R package at https://github.com/ThomasTaus/Nest.

Keywords: Pool-seq; effective population size; experimental evolution; genetic drift.

PubMed Disclaimer

Figures

**Figure 1**
Two-step sampling in experimental evolution with *Drosophila*. In E&R studies, populations are propagated at a census size N defined by the experimenter, which is in general larger than the effective population size $N_{e} .$ Using temporal methods, $N_{e}$ can be estimated from the variance in allele frequency between samples taken t generations apart. To get an accurate representation of allele frequencies in population genetic studies, a large number of individuals $S_{j}$ ( $j \in {0, t}$ ) are sampled and pooled. Sampling can take place according to sampling plan I or II based on the mode of reproduction. Pooled samples are then subjected to high-throughput sequencing. Sequenced reads are subsequently aligned to the reference genome (shown at the bottom). We represent pool sequencing by an additional sampling step (called sampling step 2). We correct for both sampling steps when estimating $N_{e}$ in pooled samples. Additionally, we take into account variable coverage levels across the genome (coverage $R_{i j}$ for site i at $T = j,$ $j \in {0, t}$ ) when correcting for the variance coming from sequencing.

**Figure 2**
Effective population size estimated with different methods. Sixty generations of Wright–Fisher neutral evolution with $N_{e} = 100$ diploid individuals were simulated for n = 2000 unlinked loci (SNPs). Prior to sampling, the population was increased to a census size of $N = 500$ individuals at each generation. At the starting population and at each indicated time point a sample was taken to create a pool of $S = 100$ individuals. The pool was sequenced to an average coverage of $R = 50$ and $N_{e}$ was estimated on the resulting data set by separately contrasting allele frequencies at generation 0 to each of the evolved generations denoted on the x-axis, using $N_{e} (P),$ $N_{e} (W)$ (Waples 1989), and $N_{e} (J R)$ (Jorde and Ryman 2007). Each box represents results from 100 simulations with identical parameters. The dashed gray line shows the true value of $N_{e} .$ Data are simulated under plan I assumptions and the results of plan I and II estimators are shown in the left and right panels, respectively.

**Figure 3**
Coefficient of variation of $N_{e} (P)$ under plan I for various parameter values. Neutral Wright–Fisher simulations were performed with various combinations of the parameters: effective population size ( $N_{e} = 100, 500, 1000$ diploid individuals), pool size ( $S = 100, 50$ ), and coverage ( $R = 150, 100, 50$ ). $N_{e}$ was estimated with $N_{e} (P)$ under plan I, using $n = 2000$ SNPs. $S = N$ indicates scenarios when the whole population is sequenced as a single pool. For all simulations, we assumed $N = N_{e} .$ Each value is calculated over 100 simulations. When the coefficient of variation exceeds one, the inset shows the actual value.

**Figure 4**
Effect of the number of SNPs used for estimating $N_{e} .$ The effective population size is estimated using $N_{e} (P)$ plan I on simulated data with $N_{e} = N = 100.$ A total number of $S = 100$ individuals are pooled and sequenced at a mean coverage of $R = 50.$ Based on 100 simulation runs, $N_{e}$ is estimated using different numbers of SNPs at multiple time points.

**Figure 5**
Influence of the starting allele frequency distribution on $N_{e} (P)$ under plan I. A comparison between uniform and Beta(0.2, 0.2)-distributed (neutral) starting allele frequencies is shown. The simulation parameters match those of the genome-wide simulations in Figure 6.

**Figure 6**
Effect of linkage disequilibrium on ${\hat{N}}_{e} .$ The effect of linkage disequilibrium on our estimator was evaluated based on a whole-genome forward simulation with recombination using the software MimicrEE (Kofler and Schlötterer 2014). Three sets of simulations were performed with different rates of recombination: high, normal, and no recombination. For each parameter setup, a genome-wide simulation is replicated 10 times. The effective population size was estimated with $N_{e} (P)$ (plan I) in nonoverlapping windows of n = 10,000 SNPs for each replicate. The box plots show the distribution of $N_{e}$ estimates across replicates and windows.

**Figure 7**
Genome-wide ${\hat{N}}_{e}$ from an E&R study with *D. melanogaster*. $N_{e}$ is estimated based on the allele frequency changes between founder and evolved populations at generation 59 (Franssen *et al.* 2015). In the top panel, we show genome-wide estimates calculated with $N_{e} (P)$ (plan I), using $N = 1000$ as census size and $S = 500$ as pool size (Orozco-terWengel *et al.* 2012) and nonoverlapping windows of 10,000 SNPs. Chromosome-wide mean estimates across replicates are shown by the dashed lines and also calculated separately for each replicate in Table 1. DNA stretches with significantly different ${\hat{N}}_{e}$ are determined using the *stepR* software package (Frick *et al.* 2014) (bottom panel). Lower and upper $1 - α$ confidence bands are shown as shaded areas. α controls the error, *i.e.*, the probability for overestimating the number of change points, and is calculated automatically as described in Frick *et al.* (2014). The colors indicate different biological replicates.

See this image and copyright information in PMC

Cited by

Do estimates of contemporary effective population size tell us what we want to know?
Ryman N, Laikre L, Hössjer O. Ryman N, et al. Mol Ecol. 2019 Apr;28(8):1904-1918. doi: 10.1111/mec.15027. Epub 2019 Apr 26. Mol Ecol. 2019. PMID: 30663828 Free PMC article.
Repeatability of evolution and genomic predictions of temperature adaptation in seed beetles.
Rêgo A, Baur J, Girard-Tercieux C, de la Paz Celorio-Mancera M, Stelkens R, Berger D. Rêgo A, et al. Nat Ecol Evol. 2025 Jun;9(6):1061-1074. doi: 10.1038/s41559-025-02716-5. Epub 2025 May 16. Nat Ecol Evol. 2025. PMID: 40379980 Free PMC article.
SNP-based analysis reveals unexpected features of genetic diversity, parental contributions and pollen contamination in a white spruce breeding program.
Galeano E, Bousquet J, Thomas BR. Galeano E, et al. Sci Rep. 2021 Mar 2;11(1):4990. doi: 10.1038/s41598-021-84566-2. Sci Rep. 2021. PMID: 33654140 Free PMC article.
The genomic scale of fluctuating selection in a natural plant population.
Kelly JK. Kelly JK. Evol Lett. 2022 Dec 11;6(6):506-521. doi: 10.1002/evl3.308. eCollection 2022 Dec. Evol Lett. 2022. PMID: 36579169 Free PMC article.
Low concordance of short-term and long-term selection responses in experimental Drosophila populations.
Langmüller AM, Schlötterer C. Langmüller AM, et al. Mol Ecol. 2020 Sep;29(18):3466-3475. doi: 10.1111/mec.15579. Epub 2020 Aug 26. Mol Ecol. 2020. PMID: 32762052 Free PMC article.

See all "Cited by" articles

References

1. Anderson E. C., Williamson E. G., Thompson E. A., 2000. Monte Carlo evaluation of the likelihood for N(e) from temporally spaced samples. Genetics 156(4): 2109–2118. - PMC - PubMed
1. Baalsrud H. T., Saether B.-E., Hagen I. J., Myhre A. M., Ringsby T. H., et al. , 2014. Effects of population characteristics and structure on estimates of effective population size in a house sparrow metapopulation. Mol. Ecol. 23(11): 2653–2668. - PubMed
1. Barker J. S. F., 2011. Effective population size of natural populations of Drosophila buzzatii, with a comparative evaluation of nine methods of estimation. Mol. Ecol. 20(21): 4452–4471. - PubMed
1. Barrick J. E., Yu D. S., Yoon S. H., Jeong H., Oh T. K., et al. , 2009. Genome evolution and adaptation in a long-term experiment with Escherichia coli. Nature 461(7268): 1243–1247. - PubMed
1. Barton N. H., 2000. Genetic hitchhiking. Philos. Trans. R. Soc. Lond. B Biol. Sci. 355(1403): 1553–1562. - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

W 1225/FWF_/Austrian Science Fund FWF/Austria

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations
Molecular Biology Databases
- FlyBase

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Estimating the Effective Population Size from Temporal Allele Frequency Changes in Experimental Evolution

Affiliations

Estimating the Effective Population Size from Temporal Allele Frequency Changes in Experimental Evolution

Authors

Affiliations

Abstract

Figures

Similar articles

Cited by

References

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Molecular Biology Databases