Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012 Feb;190(2):753-79.
doi: 10.1534/genetics.111.134544. Epub 2011 Nov 30.

The structure of genealogies in the presence of purifying selection: a fitness-class coalescent

Affiliations

The structure of genealogies in the presence of purifying selection: a fitness-class coalescent

Aleksandra M Walczak et al. Genetics. 2012 Feb.

Abstract

Compared to a neutral model, purifying selection distorts the structure of genealogies and hence alters the patterns of sampled genetic variation. Although these distortions may be common in nature, our understanding of how we expect purifying selection to affect patterns of molecular variation remains incomplete. Genealogical approaches such as coalescent theory have proven difficult to generalize to situations involving selection at many linked sites, unless selection pressures are extremely strong. Here, we introduce an effective coalescent theory (a "fitness-class coalescent") to describe the structure of genealogies in the presence of purifying selection at many linked sites. We use this effective theory to calculate several simple statistics describing the expected patterns of variation in sequence data, both at the sites under selection and at linked neutral sites. Our analysis combines a description of the allele frequency spectrum in the presence of purifying selection with the structured coalescent approach of Kaplan et al. (1988), to trace the ancestry of individuals through the distribution of fitnesses within the population. We also derive our results using a more direct extension of the structured coalescent approach of Hudson and Kaplan (1994). We find that purifying selection leads to patterns of genetic variation that are related but not identical to a neutrally evolving population in which population size has varied in a specific way in the past.

PubMed Disclaimer

Figures

Figure 1
Figure 1
The distribution of the fraction of the population in each fitness class. (A) The distribution of the number of individuals as a function of fitness, where the most beneficial class is arbitrarily defined to have fitness 1, and each deleterious mutation introduces a fitness disadvantage of s. Mutations move individuals to less-fit classes, and selection balances this by favoring the classes more fit than average. The shape of the depicted steady-state distribution is a result of this mutation–selection balance. The inset (B) shows the processes that lead to this balance within a given fitness class.
Figure 2
Figure 2
Each fitness class in the population is composed of many lineages, each of which was created by a single mutation and is (in our infinite-sites model) genetically unique. In the scheme each lineage is depicted in a different color. The arrows denote an example of the fitness-class coalescence process for two individuals sampled from classes 8 and 9. These individuals came from different lineages, and these lineages were created by mutations from different lineages within the next most-fit class (as shown by the arrows). The arrows trace the ancestry of the two individuals back through the different lineages that successively founded each other, until they finally coalesce in the class third from right.
Figure 3
Figure 3
Examples of the coalescence probabilities Pck,kk−ℓ for two individuals sampled from fitness classes k and k′ to coalesce in class k − ℓ, shown as a function of ℓ. Here Ud/s = 8, s = 10−3, and results are shown for Ns = 10 (dotted lines), Ns = 50 (dashed lines), and Ns = 100 (solid lines).
Figure 4
Figure 4
Characteristic examples of the distribution of πd. Here N = 5 × 104, s = 10−3: (A) Ud/s = 2; (B) Ud/s = 4. Theoretical predictions are shown as a solid line, simulation results as a dashed line. Simulation results are averaged across at least 300 independent simulations for each parameter set; shaded regions show one standard error in the simulation results. The fit to simulations is good, but we tend to slightly underestimate πd, and this tendency is worse for larger Ud/s. This is consistent with the effects of Muller’s ratchet, which becomes more problematic as we increase Ud/s. This systematic underestimate becomes less severe (for all values of Ud/s) as N increases, as expected, but comprehensive simulations for much larger N are computationally prohibitive.
Figure 5
Figure 5
Characteristic examples of the distributions of πn and the real coalescent times. (A) Theoretical predictions for the distribution of πn for Ud/s = 2, compared to simulation results. (B) Theoretical predictions for the distribution of πn for Ud/s = 4, compared to simulation results. Simulation results are averaged across at least 300 independent simulations for each parameter set; shaded regions show one standard error in the simulation results. (C) Theoretical predictions for the distribution of real coalescence times for Ud/s = 2; note that these simply mirror the distribution of πn, as expected. (D) Theoretical predictions for the distribution of real coalescence times for Ud/s = 4. In A–D, N = 5 × 104 and s = 10−3. Our theory agrees well with the simulations, but note that, as with πd, we tend to systematically underestimate πn, and this tendency is worse for larger Ud/s. This is consistent with Muller’s ratchet and as expected becomes more problematic for larger Ud/s. This systematic underestimate becomes less severe (for all values of Ud/s) as we increase N, as expected, but comprehensive simulations for much larger N are computationally prohibitive.
Figure 6
Figure 6
Characteristic examples of the distribution of total heterozygosity π. Here N = 5 × 104, s = 10−3: (A) Ud/s = 2; (B) Ud/s = 4. Theoretical predictions are shown as a solid line, simulation results as a dashed line. Simulation results are averaged across at least 300 independent simulations for each parameter set; shaded regions show one standard error in the simulation results. The fit to simulations is good, but we tend to slightly underestimate π, and this tendency is worse for larger Ud/s. This is for the same reasons as in the distributions of πn and πd.
Figure 7
Figure 7
Theoretical predictions for the mean pairwise heterozygosity at negatively selected sites, 〈πd〉, as a function of the parameters. (A) 〈πd〉 as a function of Ud/s for several values of Ns. In the mutation–time approximation we expect this to be linear with a slope of 2, since on average individuals are sampled from the mean class at k = Ud/s and coalesce in the 0-class, and hence we have πd = 2Ud/s. We see that as expected this approximation becomes more and more accurate as Ns increases. For smaller N, there is substantial probability of coalescence in the bulk of the fitness distribution, which is greater for larger Ud/s. Thus the slope of 〈πd〉 as a function of Ud/s decreases as Ns decreases and has a downward curvature. (B) 〈πd〉 as a function of Ns for several values of Ud/s. We see that as Ns becomes large, 〈πd〉 approaches 2Ud/s, again consistent with the mutation–time approximation. As Ns decreases, coalescence within the bulk of the fitness distribution becomes more likely, and hence 〈πd〉 decreases.
Figure 8
Figure 8
Theoretical predictions for the mean real coalescence time 〈t〉. In this figure we fix s = 10−3 and show the dependence of the mean pairwise coalescence time on N and on Ud/s. The mean pairwise heterozygosity at neutral sites, 〈πn〉, is simply 〈πn〉 = 2Unt〉. (A) Mean coalescence time as a function of N for various values of Ud/s. We see that 〈t〉 increases slowly with N until for large enough N the EPS approximation applies and 〈t〉 becomes linear in N. (B) Mean coalescence time as a function of Ud/s for several values of N. For large N, the dependence is roughly linear, consistent with the EPS approximation. For smaller N, coalescence can occur in the bulk of the fitness distribution, reducing the mean coalescence time.
Figure 9
Figure 9
The fitness-class coalescence process for three individuals, A, B, and C, where A and B coalesced τ3 steptimes ago and C coalesced with the other two τ2 steptimes ago.
Figure 10
Figure 10
Relationship between our results and an effective population-size approximation. (A) A typical coalescent tree in a neutral population of constant size. The coalescent probability per generation between a random pair of individuals is the inverse population size. Time runs from the past at the top to the present at the bottom. (B) An example of a neutral coalescent tree in a population that was smaller in the past than the present. The population size is shown as the width in green. Coalescence events are more likely to occur when the population size is smaller. (C) The effective population-size history for an individual experiencing purifying selection according to our model. The individual spends on average 1/sk generations in class k, which has a total size Nhk. Note that pairs of individuals are sampled from different classes k (i.e., they are not all sampled from the bottom of this picture). Further, the coalescence probabilities also include a factor of A/2, which reflects the probability that two lineages are in the same class at the same time. (D) The historically varying effective population size Ne(t) for a pair of individuals sampled from classes k and k′, as defined in the text, for several values of k and k′. The Ne(t) for two individuals sampled at random from the whole population is also shown. Here N = 5 × 104, Ud/s = 6, and s = 10−3.

References

    1. Barton N. H., Etheridge A. M., 2004. The effect of selection on genealogies. Genetics 166: 1115–1131 - PMC - PubMed
    1. Charlesworth B., 1994. The effect of background selection against deleterious mutations on weakly selected, linked variants. Genet. Res. 63: 213–227 - PubMed
    1. Charlesworth B., Charlesworth D., 1997. Rapid fixation of deleterious alleles can be caused by muller’s ratchet. Genet. Res. 70: 63–73 - PubMed
    1. Charlesworth B., Morgan M. T., Charlesworth D., 1993. The effect of deleterious mutations on neutral molecular variation. Genetics 134: 1289–1303 - PMC - PubMed
    1. Charlesworth D., Charlesworth B., Morgan M. T., 1995. The pattern of neutral molecular variation under the background selection model. Genetics 141: 1619–1632 - PMC - PubMed

Publication types