Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Jul 3;41(7):msae138.
doi: 10.1093/molbev/msae138.

Polymorphism-Aware Models in RevBayes: Species Trees, Disentangling Balancing Selection, and GC-Biased Gene Conversion

Affiliations

Polymorphism-Aware Models in RevBayes: Species Trees, Disentangling Balancing Selection, and GC-Biased Gene Conversion

Svitlana Braichenko et al. Mol Biol Evol. .

Abstract

The role of balancing selection is a long-standing evolutionary puzzle. Balancing selection is a crucial evolutionary process that maintains genetic variation (polymorphism) over extended periods of time; however, detecting it poses a significant challenge. Building upon the Polymorphism-aware phylogenetic Models (PoMos) framework rooted in the Moran model, we introduce a PoMoBalance model. This novel approach is designed to disentangle the interplay of mutation, genetic drift, and directional selection (GC-biased gene conversion), along with the previously unexplored balancing selection pressures on ultra-long timescales comparable with species divergence times by analyzing multi-individual genomic and phylogenetic divergence data. Implemented in the open-source RevBayes Bayesian framework, PoMoBalance offers a versatile tool for inferring phylogenetic trees as well as quantifying various selective pressures. The novel aspect of our approach in studying balancing selection lies in polymorphism-aware phylogenetic models' ability to account for ancestral polymorphisms and incorporate parameters that measure frequency-dependent selection, allowing us to determine the strength of the effect and exact frequencies under selection. We implemented validation tests and assessed the model on the data simulated with SLiM and a custom Moran model simulator. Real sequence analysis of Drosophila populations reveals insights into the evolutionary dynamics of regions subject to frequency-dependent balancing selection, particularly in the context of sex-limited color dimorphism in Drosophila erecta.

Keywords: Bayesian inference with MCMC; GC-biased gene conversion; balancing selection; polymorphism-aware phylogenetic models; site frequency spectrum; species trees.

PubMed Disclaimer

Conflict of interest statement

Conflict of interest statement None declared.

Figures

Fig. 1.
Fig. 1.
a) PoMoBalance model, presented as a Markov chain Moran-based model. The boundary states (monomorphic) are denoted by larger circles. These states encompass N individuals, with the left side showcasing individuals carrying the ai allele, and the right side representing individuals with the aj allele. In contrast, all the intermediate states, reflecting polymorphic conditions, are displayed using smaller circles. The transition rates from the monomorphic states are determined by mutation rates, whereas the transition rates from the polymorphic states are governed by the multiplicative fitness as indicated in Equation (1). Additionally, the multiplicative fitness encapsulates not only the DS effect but also the influence of BS, which exerts a force toward the state with the preferred allele frequency, Baiaj, represented by dark arrows. If the transition occurs against this preferred state, there is no such attracting force, signified by the light crossed arrows. b) A specific instance of the PoMoBalance model, featuring a population size of N=4.
Fig. 2.
Fig. 2.
Testing scenarios for PoMoBalance include various types of trees, tree topologies, parameters of PoMos utilized in the tests, sequence lengths, and the number of MCMC steps. Simulation-based calibration involves data simulated under 1,000 parameters sampled from priors, while the Moran and SLiM frameworks also rely on simulated data for several values of σ and β. Additionally, we employ experimental data extracted from various subspecies of Drosophila.
Fig. 3.
Fig. 3.
Coverage probabilities determined through validation analysis within RevBayes, employing distinct computational routines for reversible scenarios: a) PoMoSelect and b) PoMoBalance, as well as for nonreversible scenarios: c) PoMoSelect and d) PoMoBalance. The dashed lines indicate 90% CIs and fixed virtual population size for all cases was N=4.
Fig. 4.
Fig. 4.
a) Phylogenetic tree simulated using the Moran simulator within RevBayes, the branch lengths are expressed in numbers of generations; the tree remains fixed for these analyses. b) SFS of the data with BS simulated using the Moran model with N=6 (stars), with the tree from (a) exhibiting good agreement with the SFS obtained from the inference using PoMoBalance (diamonds); the inset magnifies the BS peak. c) Phylogenetic tree of great apes simulated with SLiM and subsequently inferred with RevBayes, the branch lengths are expressed in the number of substitutions per site. Posterior probabilities are indicated at the nodes. Images are distributed under a Creative Commons license from Wikimedia and Microsoft. d) Comparison of the SFS with N=10, akin to (b), obtained from the simulated data with SLiM and the tree from (c). The SFS representation (aiaj) includes AC, AG, AT, CG, CT, and GT, demonstrating similarity in all cases.
Fig. 5.
Fig. 5.
Posterior distributions of inferred parameters compared to their expected values. Subplots a), b), and c) employ the Moran model simulator, in Fig. 4a and b. Conversely, subplots d), e), and f) use the SLiM simulator, corresponding to Fig. 4c and d. Data simulations encompass four regimes: D for drift, GC for gBGC, BS for balancing selection, and GC + BS for the combination of gBGC and BS. Inference methods include BalFB, representing inference with PoMoBalance while fixing preferred frequencies B, and Bal, representing regular inference with PoMoBalance. True values are indicated by dashed and dot-dashed lines. a) Posterior plots for the GC-bias rate σ, with two boxplots on the left indicating simulated data in regime D inferred with BalFB and BS inferred with Bal. Two boxplots on the left show distributions that correspond to regime GC inferred with BalFB and GC + BS inferred with Bal. b) Estimates for mutation rates, and c) strengths of BS in the simulation scenario GC + BS. d) Posterior plots for SLiM data inference in three simulation regimes D, BS, and GC, analogous to (a), indicating the GC-bias rate σ. e) Estimates for mutation rates and f) strengths of BS corresponding to the BS simulation scenario in SLiM.
Fig. 6.
Fig. 6.
Testing PoMoBalance in a range of GC-bias rate σ and strength of BS β on the data generated with the Moran model. Large open markers represent true values, smaller closed markers with error bars correspond to the mean values of posterior predictions by PoMoBalance and their 95%CI, respectively.
Fig. 7.
Fig. 7.
Phylogenetic tree inferred from the sequencing data obtained in the tMSE region across six (left) and four (right) subspecies of Drosophila. Posterior probabilities are indicated at the nodes. Images of D. santomea, yakuba, melanogaster, and simulans are credited to Darren Obbard, while those of D. erecta are reproduced from Yassin et al. (2016) under Creative Commons licence 4.0.
Fig. 8.
Fig. 8.
Posterior distributions derived from experimental data extracted from the tMSE region of six subspecies, as shown in Fig. 7 for PoMoSelect inference, and four Drosophila subspecies, namely D. erecta dark and light, melanogaster, and simulans for PoMoBalance inference. The corresponding SFS for the PoMobalance is presented in Fig. 9. a) Estimated rates of gBGC with PoMoSelect on the left and PoMoBalance on the right. b) Mutation rates, c) strength of BS, and d) preferred frequencies for BS, all inferred using PoMoBalance.
Fig. 9.
Fig. 9.
SFS representation for the tMSE region corresponding to the PoMoBalance analysis in Fig. 8 for four subspecies of Drosophila, depicted in stars, compared with the inferred SFS indicated by diamonds.

Similar articles

References

    1. Andrés AM, Hubisz MJ, Indap A, Torgerson DG, Degenhardt JD, Boyko AR, Gutenkunst RN, White TJ, Green ED, Bustamante CD, et al. Targets of balancing selection in the human genome. Mol Biol Evol. 2009:26(12):2755–2764. 10.1093/molbev/msp190. - DOI - PMC - PubMed
    1. Bakker EG, Toomajian C, Kreitman M, Bergelson J. A genome-wide survey of R gene polymorphisms in Arabidopsis. Plant Cell. 2006:18(8):1803–1818. 10.1105/tpc.106.042614. - DOI - PMC - PubMed
    1. Barata C, Borges R, Kosiol C. Bait-ER: a Bayesian method to detect targets of selection in evolve-and-resequence experiments. J Evol Biol. 2023:36(1):29–44. 10.1111/jeb.v36.1. - DOI - PMC - PubMed
    1. Begun DJ, Holloway AK, Stevens K, Hillier LW, Poh Y-P, Hahn MW, Nista PM, Jones CD, Kern AD, Dewey CN, et al. Population genomics: whole-genome analysis of polymorphism and divergence in Drosophila simulans. PLoS Biol. 2007:5(11):e310. 10.1371/journal.pbio.0050310. - DOI - PMC - PubMed
    1. Bitarello BD, Brandt DYC, Meyer D, Andrés AM. Inferring balancing selection from genome-scale data. Genome Biol Evol. 2023:15(3):evad032. 10.1093/gbe/evad032. - DOI - PMC - PubMed

LinkOut - more resources