Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Jul;200(3):921-34.
doi: 10.1534/genetics.115.176818. Epub 2015 May 7.

Genetic Variability Under the Seedbank Coalescent

Affiliations

Genetic Variability Under the Seedbank Coalescent

Jochen Blath et al. Genetics. 2015 Jul.

Abstract

We analyze patterns of genetic variability of populations in the presence of a large seedbank with the help of a new coalescent structure called the seedbank coalescent. This ancestral process appears naturally as a scaling limit of the genealogy of large populations that sustain seedbanks, if the seedbank size and individual dormancy times are of the same order as those of the active population. Mutations appear as Poisson processes on the active lineages and potentially at reduced rate also on the dormant lineages. The presence of "dormant" lineages leads to qualitatively altered times to the most recent common ancestor and nonclassical patterns of genetic diversity. To illustrate this we provide a Wright-Fisher model with a seedbank component and mutation, motivated from recent models of microbial dormancy, whose genealogy can be described by the seedbank coalescent. Based on our coalescent model, we derive recursions for the expectation and variance of the time to most recent common ancestor, number of segregating sites, pairwise differences, and singletons. Estimates (obtained by simulations) of the distributions of commonly employed distance statistics, in the presence and absence of a seedbank, are compared. The effect of a seedbank on the expected site-frequency spectrum is also investigated using simulations. Our results indicate that the presence of a large seedbank considerably alters the distribution of some distance statistics, as well as the site-frequency spectrum. Thus, one should be able to detect from genetic data the presence of a large seedbank in natural populations.

Keywords: Wright–Fisher model; distance statistics; dormancy; seedbank coalescent; site-frequency spectrum.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Realization of a seedbank coalescent with all n=8 sampled lineages assumed active. Mutations are, in this example, allowed to occur only on active ancestral lineages represented by solid lines; ancestral lineages residing in the seedbank are represented by dotted lines. Coalescences are allowed only among active lineages.
Figure 2
Figure 2
Dynamics of reversible microbial dormancy, according to Jones and Lennon (2010).
Figure 3
Figure 3
Estimates of the normalized site-frequency spectrum ϕi(n)=E[ξi(n)]/E[|ξ(n)|] with all n=100 sampled lines assumed active and values of c and K as shown (d=0). The mutation rate in the active population is fixed, θ1=2, and there is no mutation in the dormant states (θ2=0). All estimates based on 105 replicates.
Figure 4
Figure 4
Estimates of the distribution of the 2(n) statistic (Equation 23), with all n=100 sampled lines assumed active, c and K as shown, θ1=2, and θ2=0. The vertical dashed lines are the 5%, 25%, 50%, 75%, and 95% quantiles and the solid square (■) denotes the mean. The entries are normalized to have unit mass 1. All estimates are based on 105 replicates.
Figure 5
Figure 5
Estimates of the distribution of Tajima’s DT (Equation 24) with all n=100 sampled lines assumed active, θ1=2, and θ2=0. The vertical dashed lines are the 5%, 25%, 50%, 75%, and 95% quantiles and the solid square (■) denotes the mean. The entries are normalized to have unit mass 1. The histograms are drawn on the same horizontal scale. Estimates are based on 105 replicates.
Figure 6
Figure 6
Estimates of the distribution of Fu and Li’s DFL (Equation 25) with all n=100 sampled lines assumed active, θ1=2, and θ2=0. The vertical dashed lines are the 5%, 25%, 50%, 75%, and 95% quantiles and the solid square (■) denotes the mean. The entries are normalized to have unit mass 1. The histograms are drawn on the same horizontal scale. Estimates are based on 105 replicates.
Figure 7
Figure 7
Estimates of the distribution of Fay and Wu’s DFW (Equation 26) with all n=100 sampled lines assumed active, θ1=2, and θ2=0. The vertical dashed lines are the 5%, 25%, 50%, 75%, and 95% quantiles and the solid square (■) denotes the mean. The entries are normalized to have unit mass 1. The histograms are drawn on the same horizontal scale. Estimates are based on 105 replicates.

References

    1. Blath J., González Casanova A., Kurt N., Spanò D., 2013. The ancestral process of long-range seed bank models. J. Appl. Probab. 50: 741–759.
    1. Blath, J., B. Eldon, A. Casanova, and N. Kurt, 2015a Genealogy of a Wright Fisher model with strong seed bank component, in XI Symposium on Probability and Stochastic Processes, Progress in Probability, edited by R. Mena, J. C. Pardo, V. Rivero, and G. Uribe Bravo. Birkhäuser, Basel, Switzerland (in press).
    1. Blath J., Gonzalez-Casanova A., Kurt N., Wilke-Berenguer M., 2015b The seed-bank coalescent. Ann. Appl. Probab. (in press).
    1. Cannings C., 1974. The latent roots of certain Markov chains arising in genetics: a new approach, I. Haploid models. Adv. Appl. Probab. 6: 260–290.
    1. Eldon B., Birkner M., Blath J., Freund F., 2015. Can the site-frequency spectrum distinguish exponential population growth from multiple-merger coalescents? Genetics 199: 841–856. - PMC - PubMed

Publication types

LinkOut - more resources