Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Mar 12;116(11):5027-5036.
doi: 10.1073/pnas.1813836116. Epub 2019 Feb 26.

Simultaneous Bayesian inference of phylogeny and molecular coevolution

Affiliations

Simultaneous Bayesian inference of phylogeny and molecular coevolution

Xavier Meyer et al. Proc Natl Acad Sci U S A. .

Abstract

Patterns of molecular coevolution can reveal structural and functional constraints within or among organic molecules. These patterns are better understood when considering the underlying evolutionary process, which enables us to disentangle the signal of the dependent evolution of sites (coevolution) from the effects of shared ancestry of genes. Conversely, disregarding the dependent evolution of sites when studying the history of genes negatively impacts the accuracy of the inferred phylogenetic trees. Although molecular coevolution and phylogenetic history are interdependent, analyses of the two processes are conducted separately, a choice dictated by computational convenience, but at the expense of accuracy. We present a Bayesian method and associated software to infer how many and which sites of an alignment evolve according to an independent or a pairwise dependent evolutionary process, and to simultaneously estimate the phylogenetic relationships among sequences. We validate our method on synthetic datasets and challenge our predictions of coevolution on the 16S rRNA molecule by comparing them with its known molecular structure. Finally, we assess the accuracy of phylogenetic trees inferred under the assumption of independence among sites using synthetic datasets, the 16S rRNA molecule and 10 additional alignments of protein-coding genes of eukaryotes. Our results demonstrate that inferring phylogenetic trees while accounting for dependent site evolution significantly impacts the estimates of the phylogeny and the evolutionary process.

Keywords: Bayesian inference; molecular coevolution; phylogeny; tree of life.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflicts of interest.

Figures

Fig. 1.
Fig. 1.
CoevRJ analysis flow. (1) A multiple sequence alignment containing nucleotides must be provided as input for CoevRJ. (2) After the analysis of the dataset, CoevRJ produces log files containing samples from the joint posterior distribution. These samples enable the estimation of the posterior probability of (i) the parameters of the evolutionary processes (GTR+Γ and Coev), (ii) the tree topologies and the branch lengths, and (iii) the pairs of sites and their profile. Further postanalyses with CoevRJ define the significance threshold for the coevolving pairs and provide easily readable summary statistics.
Fig. 2.
Fig. 2.
Validation of CoevRJ and comparison with a model assuming independence among sites (GTR+Γ) on synthetic datasets. Relative errors on (A) the rate heterogeneity and (B) the total branch length when inferred by CoevRJ and the GTR+Γ model in proportion to the amount of coevolution simulated. Box-plot whiskers extend to 1.5× the interquartile range; outliers are not shown. (C) Number of bipartitions, or internal nodes, exclusively misidentified by CoevRJ or GTR+Γ (bipartitions misidentified with both models are not reported). Misidentified bipartitions can be either bipartitions not present in the simulated phylogeny but inferred with P>0.95 or bipartitions present in the simulated phylogeny but inferred with P<0.5. Errors bars represent the SD.
Fig. 3.
Fig. 3.
(A) Mapping of CoevRJ predictions of coevolving pairs on the structure of Escherichia coli 16S rRNA. All predicted coevolving pairs with P>0.5 are reported on the 2D structure of E. coli (22). Pairs highlighted in red are at most 6.5 Å distant on the 3D structure (Materials and Methods). Pairs in blue are more distantly located than this threshold; for that reason, the second position of the pair is indicated within the pair highlight. (B) Distance between positions of pairs ranked by their posterior probability of being coevolving. Only pairs with P>0.05, corresponding to strongly significant pairs compared with the prior expectation (Materials and Methods), are reported in this figure.
Fig. 4.
Fig. 4.
Impact of accounting for dependent sites on the dating of the tree of life. Ultrametric trees resulting from an analysis with the penalized likelihood method (27) configured to accommodate for large rate variation (λ=0) of the majority rule consensus trees inferred by CoevRJ (left) and a purely independent sites model (GTR+Γ, right). The root age is arbitrarily placed at 1.
Fig. 5.
Fig. 5.
Differences between analyses conducted with CoevRJ and a model assuming independence among sites (GTR+Γ). Datasets are ranked by the percentage of coevolving pairs predicted with P>0.5. The percentage is defined with respect to the maximum number of coevolving pairs observable at once (defined as the alignment length divided by 2). (A) Percentage of predicted coevolving pairs with P>0.5 (bar length) and P>0.95 (white stripe). (BD) Divergences between parameters inferred with the purely independent sites model (GTR+Γ) and CoevRJ. The relative differences using CoevRJ as reference are reported for (B) the rate heterogeneity, (C) the overall branch length, and (D) the branch lengths shared in both consensus trees. Box-plot whiskers extend to 1.5× the interquartile range; outliers are not shown. (E) Percentage of inconsistently placed internal nodes between both consensus trees as defined by the normalized Robinson–Foulds distance (29).

References

    1. Dib L, Salamin N, Gfeller D. Polymorphic sites preferentially avoid co-evolving residues in MHC class I proteins. PLoS Comput Biol. 2018;14:e1006188. - PMC - PubMed
    1. Douam F, et al. A protein coevolution method uncovers critical features of the hepatitis C virus fusion mechanism. PLoS Pathog. 2018;14:e1006908. - PMC - PubMed
    1. de Juan D, Pazos F, Valencia A. Emerging methods in protein co-evolution. Nat Rev Genet. 2013;14:249–261. - PubMed
    1. Szurmant H, Weigt M. Inter-residue, inter-protein and inter-family coevolution: Bridging the scales. Curr Opin Struct Biol. 2018;50:26–32. - PMC - PubMed
    1. Talavera D, Lovell SC, Whelan S. Covariation is a poor measure of molecular coevolution. Mol Biol Evol. 2015;32:2456–2468. - PMC - PubMed

Publication types

Substances

LinkOut - more resources