Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Nov;201(3):1133-41.
doi: 10.1534/genetics.115.179606. Epub 2015 Aug 26.

Inference Under a Wright-Fisher Model Using an Accurate Beta Approximation

Affiliations

Inference Under a Wright-Fisher Model Using an Accurate Beta Approximation

Paula Tataru et al. Genetics. 2015 Nov.

Abstract

The large amount and high quality of genomic data available today enable, in principle, accurate inference of evolutionary histories of observed populations. The Wright-Fisher model is one of the most widely used models for this purpose. It describes the stochastic behavior in time of allele frequencies and the influence of evolutionary pressures, such as mutation and selection. Despite its simple mathematical formulation, exact results for the distribution of allele frequency (DAF) as a function of time are not available in closed analytical form. Existing approximations build on the computationally intensive diffusion limit or rely on matching moments of the DAF. One of the moment-based approximations relies on the beta distribution, which can accurately describe the DAF when the allele frequency is not close to the boundaries (0 and 1). Nonetheless, under a Wright-Fisher model, the probability of being on the boundary can be positive, corresponding to the allele being either lost or fixed. Here we introduce the beta with spikes, an extension of the beta approximation that explicitly models the loss and fixation probabilities as two spikes at the boundaries. We show that the addition of spikes greatly improves the quality of the approximation. We additionally illustrate, using both simulated and real data, how the beta with spikes can be used for inference of divergence times between populations with comparable performance to an existing state-of-the-art method.

Keywords: Wright-Fisher; beta; divergence times; linear evolutionary pressures; pure genetic drift.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Fit of the beta and beta with spikes approximations for a population of size 2N=200. (A–F) The true discrete DAF as given by the Wright-Fisher model under pure genetic drift and the corresponding discretized beta (A–C) and beta with spikes (D–F) approximations. The distributions are conditional on an initial frequency x0=0.2 and for different time points t/2N=0.035 (A, D), t/2N=0.115 (B, E), and t/2N=0.195 (C, F), where t is the number of discrete generations that the population has evolved. (G, H) The Hellinger distance between the true DAF and the beta (G) and beta with spikes (H) as a function of x0 and t/2N. Each row and column corresponds to specific values of the scaled parameters 4Na and 4Nb. The distances corresponding to the distributions from A–F are marked with arrows. The discretization procedure and Hellinger distance are detailed in File S1.
Figure 2
Figure 2
History of three populations in the present. Ancestral population 5 splits in populations 3 and 4, which further splits in populations 2 and 1. For each SNP i and present population j{1,2,3}, the data consist of the sample size nij and allele count zij. The branch length between populations k and j is given as (t/2N)kj and represents the scaled number of generations that population j evolved since the split from the ancestral population k. The unknown allele frequencies of each population are denoted as Xij, with 1j5.
Figure 3
Figure 3
Inference of divergence times for simulation scenarios I (A) and II (B). The figure shows box plots summarizing the inferred lengths for the four branches of the tree, indicated at the top of each column in black. The inferred lengths are plotted for beta (B), beta with spikes (BS), and Kim Tree (KT) (Gautier and Vitalis 2013). The (true) simulated length of each branch is plotted as a horizontal line. Each plot is scaled relative to the corresponding simulated branch length τ, with the limits of the y-axis set to [τ0.1,τ1.5].
Figure 4
Figure 4
Inference of divergence times for the chimpanzee exome data. The figure shows box plots summarizing the inferred lengths using 50 data sets with 10,000 SNPs that were randomly sampled from the full data set. The corresponding tree branches are indicated at the top of each plot in black. The inferred lengths are plotted for beta (B), beta with spikes (BS), and Kim Tree (KT) (Gautier and Vitalis 2013). The nonsolid lines indicate the inferred lengths when running the methods on the full data set of 42,064 SNPs. The populations at the leaves are Eastern (E), Central (C), and Western (W). Each plot is scaled relative to the corresponding branch length τ inferred by beta with spikes on the full data set. The limits of the y-axis are set to [τ0.05,τ1.5].

References

    1. Abramowitz M., Stegun I. A., 1964. Handbook of Mathematical Functions: With Formulas, Graphs, and Mathematical Tables. Dover Publications, Mineola, NY.
    1. Balding D. J., Nichols R. A., 1995. A method for quantifying differentiation between populations at multiallelic loci and its implications for investigating identity and paternity. Genetica 96: 3–12. - PubMed
    1. Balding D. J., Nichols R. A., 1997. Significant genetic correlations among Caucasians at forensic DNA loci. Heredity 78: 583–589. - PubMed
    1. Bank C., Ewing G. B., Ferrer-Admettla A., Foll M., Jensen J. D., 2014. Thinking too positive? Revisiting current methods of population genetic selection inference. Trends Genet. 30: 540–546. - PubMed
    1. Bataillon T., Duan J., Hvilsom C., Jin X., Li Y., et al. , 2015. Inference of purifying and positive selection in three subspecies of chimpanzees (Pan troglodytes) from exome sequencing. Genome Biol. Evol. 7: 1122–1132. - PMC - PubMed

Publication types