The structure of genealogies in the presence of purifying selection: a fitness-class coalescent

Aleksandra M Walczak¹, Lauren E Nicolaisen, Joshua B Plotkin, Michael M Desai

Affiliations

PMID: 22135349
PMCID: PMC3276618
DOI: 10.1534/genetics.111.134544

The structure of genealogies in the presence of purifying selection: a fitness-class coalescent

Aleksandra M Walczak et al. Genetics. 2012 Feb.

. 2012 Feb;190(2):753-79.

doi: 10.1534/genetics.111.134544. Epub 2011 Nov 30.

Authors

Aleksandra M Walczak¹, Lauren E Nicolaisen, Joshua B Plotkin, Michael M Desai

Affiliation

¹ CNRS-Laboratoire de Physique Théorique de l'École Normale Supérieure, 75231 Paris Cedex 05, France.

PMID: 22135349
PMCID: PMC3276618
DOI: 10.1534/genetics.111.134544

Abstract

Compared to a neutral model, purifying selection distorts the structure of genealogies and hence alters the patterns of sampled genetic variation. Although these distortions may be common in nature, our understanding of how we expect purifying selection to affect patterns of molecular variation remains incomplete. Genealogical approaches such as coalescent theory have proven difficult to generalize to situations involving selection at many linked sites, unless selection pressures are extremely strong. Here, we introduce an effective coalescent theory (a "fitness-class coalescent") to describe the structure of genealogies in the presence of purifying selection at many linked sites. We use this effective theory to calculate several simple statistics describing the expected patterns of variation in sequence data, both at the sites under selection and at linked neutral sites. Our analysis combines a description of the allele frequency spectrum in the presence of purifying selection with the structured coalescent approach of Kaplan et al. (1988), to trace the ancestry of individuals through the distribution of fitnesses within the population. We also derive our results using a more direct extension of the structured coalescent approach of Hudson and Kaplan (1994). We find that purifying selection leads to patterns of genetic variation that are related but not identical to a neutrally evolving population in which population size has varied in a specific way in the past.

PubMed Disclaimer

Figures

**Figure 1**
The distribution of the fraction of the population in each fitness class. (A) The distribution of the number of individuals as a function of fitness, where the most beneficial class is arbitrarily defined to have fitness 1, and each deleterious mutation introduces a fitness disadvantage of s. Mutations move individuals to less-fit classes, and selection balances this by favoring the classes more fit than average. The shape of the depicted steady-state distribution is a result of this mutation–selection balance. The inset (B) shows the processes that lead to this balance within a given fitness class.

**Figure 2**
Each fitness class in the population is composed of many lineages, each of which was created by a single mutation and is (in our infinite-sites model) genetically unique. In the scheme each lineage is depicted in a different color. The arrows denote an example of the fitness-class coalescence process for two individuals sampled from classes 8 and 9. These individuals came from different lineages, and these lineages were created by mutations from different lineages within the next most-fit class (as shown by the arrows). The arrows trace the ancestry of the two individuals back through the different lineages that successively founded each other, until they finally coalesce in the class third from right.

**Figure 3**
Examples of the coalescence probabilities $P_{c}^{k, k' \to k −ℓ}$ for two individuals sampled from fitness classes k and k′ to coalesce in class k − ℓ, shown as a function of ℓ. Here U_d/s = 8, s = 10⁻³, and results are shown for Ns = 10 (dotted lines), Ns = 50 (dashed lines), and Ns = 100 (solid lines).

**Figure 4**
Characteristic examples of the distribution of π_d. Here N = 5 × 10⁴, s = 10⁻³: (A) U_d/s = 2; (B) U_d/s = 4. Theoretical predictions are shown as a solid line, simulation results as a dashed line. Simulation results are averaged across at least 300 independent simulations for each parameter set; shaded regions show one standard error in the simulation results. The fit to simulations is good, but we tend to slightly underestimate π_d, and this tendency is worse for larger U_d/s. This is consistent with the effects of Muller’s ratchet, which becomes more problematic as we increase U_d/s. This systematic underestimate becomes less severe (for all values of U_d/s) as N increases, as expected, but comprehensive simulations for much larger N are computationally prohibitive.

**Figure 5**
Characteristic examples of the distributions of π_n and the real coalescent times. (A) Theoretical predictions for the distribution of π_n for U_d/s = 2, compared to simulation results. (B) Theoretical predictions for the distribution of π_n for U_d/s = 4, compared to simulation results. Simulation results are averaged across at least 300 independent simulations for each parameter set; shaded regions show one standard error in the simulation results. (C) Theoretical predictions for the distribution of real coalescence times for U_d/s = 2; note that these simply mirror the distribution of π_n, as expected. (D) Theoretical predictions for the distribution of real coalescence times for U_d/s = 4. In A–D, N = 5 × 10⁴ and s = 10⁻³. Our theory agrees well with the simulations, but note that, as with π_d, we tend to systematically underestimate π_n, and this tendency is worse for larger U_d/s. This is consistent with Muller’s ratchet and as expected becomes more problematic for larger U_d/s. This systematic underestimate becomes less severe (for all values of U_d/s) as we increase N, as expected, but comprehensive simulations for much larger N are computationally prohibitive.

**Figure 6**
Characteristic examples of the distribution of total heterozygosity π. Here N = 5 × 10⁴, s = 10⁻³: (A) U_d/s = 2; (B) U_d/s = 4. Theoretical predictions are shown as a solid line, simulation results as a dashed line. Simulation results are averaged across at least 300 independent simulations for each parameter set; shaded regions show one standard error in the simulation results. The fit to simulations is good, but we tend to slightly underestimate π, and this tendency is worse for larger U_d/s. This is for the same reasons as in the distributions of π_n and π_d.

**Figure 7**
Theoretical predictions for the mean pairwise heterozygosity at negatively selected sites, 〈π_d〉, as a function of the parameters. (A) 〈π_d〉 as a function of U_d/s for several values of Ns. In the mutation–time approximation we expect this to be linear with a slope of 2, since on average individuals are sampled from the mean class at k = U_d/s and coalesce in the 0-class, and hence we have π_d = 2U_d/s. We see that as expected this approximation becomes more and more accurate as Ns increases. For smaller N, there is substantial probability of coalescence in the bulk of the fitness distribution, which is greater for larger U_d/s. Thus the slope of 〈π_d〉 as a function of U_d/s decreases as Ns decreases and has a downward curvature. (B) 〈π_d〉 as a function of Ns for several values of U_d/s. We see that as Ns becomes large, 〈π_d〉 approaches 2U_d/s, again consistent with the mutation–time approximation. As Ns decreases, coalescence within the bulk of the fitness distribution becomes more likely, and hence 〈π_d〉 decreases.

**Figure 8**
Theoretical predictions for the mean real coalescence time 〈t〉. In this figure we fix s = 10⁻³ and show the dependence of the mean pairwise coalescence time on N and on U_d/s. The mean pairwise heterozygosity at neutral sites, 〈π_n〉, is simply 〈π_n〉 = 2U_n〈t〉. (A) Mean coalescence time as a function of N for various values of U_d/s. We see that 〈t〉 increases slowly with N until for large enough N the EPS approximation applies and 〈t〉 becomes linear in N. (B) Mean coalescence time as a function of U_d/s for several values of N. For large N, the dependence is roughly linear, consistent with the EPS approximation. For smaller N, coalescence can occur in the bulk of the fitness distribution, reducing the mean coalescence time.

**Figure 9**
The fitness-class coalescence process for three individuals, A, B, and C, where A and B coalesced τ₃ steptimes ago and C coalesced with the other two τ₂ steptimes ago.

**Figure 10**
Relationship between our results and an effective population-size approximation. (A) A typical coalescent tree in a neutral population of constant size. The coalescent probability per generation between a random pair of individuals is the inverse population size. Time runs from the past at the top to the present at the bottom. (B) An example of a neutral coalescent tree in a population that was smaller in the past than the present. The population size is shown as the width in green. Coalescence events are more likely to occur when the population size is smaller. (C) The effective population-size history for an individual experiencing purifying selection according to our model. The individual spends on average $1 / s k$ generations in class k, which has a total size *Nh_k*. Note that pairs of individuals are sampled from different classes k (*i.e.*, they are not all sampled from the bottom of this picture). Further, the coalescence probabilities also include a factor of A/2, which reflects the probability that two lineages are in the same class at the same time. (D) The historically varying effective population size N_e(t) for a pair of individuals sampled from classes k and k′, as defined in the text, for several values of k and k′. The N_e(t) for two individuals sampled at random from the whole population is also shown. Here N = 5 × 10⁴, U_d/s = 6, and s = 10⁻³.

See this image and copyright information in PMC

References

1. Barton N. H., Etheridge A. M., 2004. The effect of selection on genealogies. Genetics 166: 1115–1131 - PMC - PubMed
1. Charlesworth B., 1994. The effect of background selection against deleterious mutations on weakly selected, linked variants. Genet. Res. 63: 213–227 - PubMed
1. Charlesworth B., Charlesworth D., 1997. Rapid fixation of deleterious alleles can be caused by muller’s ratchet. Genet. Res. 70: 63–73 - PubMed
1. Charlesworth B., Morgan M. T., Charlesworth D., 1993. The effect of deleterious mutations on neutral molecular variation. Genetics 134: 1289–1303 - PMC - PubMed
1. Charlesworth D., Charlesworth B., Morgan M. T., 1995. The pattern of neutral molecular variation under the background selection model. Genetics 141: 1619–1632 - PMC - PubMed

Publication types

Actions
Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

The structure of genealogies in the presence of purifying selection: a fitness-class coalescent

Affiliation

The structure of genealogies in the presence of purifying selection: a fitness-class coalescent

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Miscellaneous