Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Sep 5;5(2):vez036.
doi: 10.1093/ve/vez036. eCollection 2019 Jul.

Divergence dating using mixed effects clock modelling: An application to HIV-1

Affiliations

Divergence dating using mixed effects clock modelling: An application to HIV-1

Magda Bletsa et al. Virus Evol. .

Abstract

The need to estimate divergence times in evolutionary histories in the presence of various sources of substitution rate variation has stimulated a rich development of relaxed molecular clock models. Viral evolutionary studies frequently adopt an uncorrelated clock model as a generic relaxed molecular clock process, but this may impose considerable estimation bias if discrete rate variation exists among clades or lineages. For HIV-1 group M, rate variation among subtypes has been shown to result in inconsistencies in time to the most recent common ancestor estimation. Although this calls into question the adequacy of available molecular dating methods, no solution to this problem has been offered so far. Here, we investigate the use of mixed effects molecular clock models, which combine both fixed and random effects in the evolutionary rate, to estimate divergence times. Using simulation, we demonstrate that this model outperforms existing molecular clock models in a Bayesian framework for estimating time-measured phylogenies in the presence of mixed sources of rate variation, while also maintaining good performance in simpler scenarios. By analysing a comprehensive HIV-1 group M complete genome data set we confirm considerable rate variation among subtypes that is not adequately modelled by uncorrelated relaxed clock models. The mixed effects clock model can accommodate this rate variation and results in a time to the most recent common ancestor of HIV-1 group M of 1920 (1915-25), which is only slightly earlier than the uncorrelated relaxed clock estimate for the same data set. The use of complete genome data appears to have a more profound impact than the molecular clock model because it reduces the credible intervals by 50 per cent relative to similar estimates based on short envelope gene sequences.

Keywords: Bayesian inference; HIV; divergence time; mixed effects; molecular clock.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Simulation analyses for a different mean rate in one of four identical clades of a 40 taxa tree with dated tips. The scenario of rate variation is depicted on the left; the resulting mean relative errors and coverage proportions are shown in the middle and on the right, respectively, for the five different models (SC, strict clock; UC, uncorrelated relaxed; FL, fixed local; RL, random local; ME, mixed effects). The rate distributions are depicted on a log scale. In the simulations, rates are drawn from the grey rate distribution (with a mean rate of 0.001 substitutions per site per year) for all branches except for one of the clades (‘clade A’), for which rates are drawn from the pink distributions with means (μA) indicated on a natural scale in the plots. The relative rate bar plots summarize mean relative errors across 20 replicates, and for the age estimates for five nodes of interest in each replicate. The whiskers represent standard errors. Coverage bar plots summarize coverage proportions across 20 replicates and for the same age estimates. In the relative rate and coverage bar plots, we use a star to indicate the model with the lowest error and highest coverage, respectively (if there is a single best value).
Figure 2.
Figure 2.
Simulation analyses for mean rate differences in two different clades and for fixed local clock scenarios. The scenario of rate variation is depicted on the left; the resulting mean relative errors and coverage proportions are shown in the middle and on the right, respectively, for the five different models (SC, strict clock; UC, uncorrelated relaxed; FL, fixed local; RL, random local; ME, mixed effects). The rate distributions are depicted on a log scale. In the first four simulations, rates are drawn from the grey rate distribution (with a mean rate of 0.001 substitutions per site per year) for all branches except for two of the clades (clades A and B or C), for which rates are drawn from the pink distributions with means (μA, μB, or μC) indicated on a natural scale in the plots. In the next four simulations, rates are fixed to the value indicated by the grey line (a rate of 0.001 substitutions per site per year) for all branches except for one or two of the clades for which rates are fixed to the value indicated by the pink line. The relative rate and coverage bar plots can be described as in Fig. 1.
Figure 3.
Figure 3.
Phylogenetic reconstruction and temporal signal for HIV-1 group M subtypes. All subtype clusters indicated in the ML tree reconstruction yielded maximum support according to the approximate likelihood ratio test. For the HIV group M (including subtypes A, B, C, D, and F) and for each separate subtype, except for subtype K (represented by only two genomes), we show root-to-tip divergence as a function of sampling year. Coefficients of determination (R2) and slope estimates are indicated in the upper left corner of the regression plots. Subtype clades and their respective data points are coloured according to a diverging colour scheme ordered by slope: fast (red) to slow (blue).
Figure 4.
Figure 4.
Separate and joint estimates of HIV-1 subtype rates and tMRCAs. The separate estimates are obtained under an UC clock model while the joint estimates represent average branch-specific rates under an UC and ME clock model. The boxplot colouring by subtype follows the colours in Fig. 3. Similar UC clock model estimates are obtained using a non-parametric coalescent prior (Supplementary Fig. S4). Separate and joint estimates rate estimates under a strict clock model can be found in Supplementary Fig. S5.
Figure 5.
Figure 5.
Bayesian estimates of synonymous and nonsynonymous substitution rates. The first and second panel summarize absolute rates of synonymous (rS) and nonsynonymous (rN) substitution, respectively. The third panel summarizes rN/3*rS; an ML estimate of dN/dS is also included as a grey cross (cf. Section 2). The boxplot colors follow the subtype colouring in Fig. 3.
Figure 6.
Figure 6.
HIV-1 group M maximum clade credibility tree estimated using the mixed effects clock model. All subtype clades, except for subtype A, are represented as a single cartoon clade. The boxplot colouring by subtype follows the colors in Fig. 3. Because 1960A falls in subtype A as a sister clade to sub-subtype A4, we split up this subtype into a sub-subtype A1 clade (which also contains A3 and A6 strains) and sub-subtype A2 and A4. The marginal posterior age distribution for the 1959, 1960, and 1966 sequences are represented by a violin density (yellow, red, and orange, respectively) that is cut at the credible intervals. The sampling years of these sequences are represented by a dotted line within these violin densities. For the subtype clade tMRCAs and the nodes ancestral to these clades, blue node bars represent the node height credible intervals. The node circle sizes and colouring for these nodes are proportional to the respective posterior probability support values.
Figure 7.
Figure 7.
tMRCA estimates for HIV-1 group M obtained by different studies. We summarize the estimates obtained by Korber et al. (2000), Salemi et al. (2001), Yusim et al. (2001), Worobey et al. (2008), Faria et al. (2014) and compare them to the uncorrelated (UC) and mixed effects (ME) model estimates for our complete genome data. For Yusim et al. (2001), we include the estimates based on the approach by Korber et al. (2000) (1) and those based on a relaxed clock model (2).

Similar articles

Cited by

References

    1. Abecasis A. B. et al. (2007) ‘Recombination Confounds the Early Evolutionary History of Human Immunodeficiency Virus Type 1: Subtype G Is a Circulating Recombinant Form’, Journal of Virology, 81: 8543. - PMC - PubMed
    1. Ayres D. L. et al. (2012) ‘BEAGLE: An Application Programming Interface and High-Performance Computing Library for Statistical Phylogenetics’, Systematic Biology, 61: 170. - PMC - PubMed
    1. Baele G. et al. (2016) ‘Bayesian Codon Substitution Modelling to Identify Sources of Pathogen Evolutionary Rate Variation’, Microbial Genomics, 2: e000057. - PMC - PubMed
    1. Bielejec F. et al. (2014) ‘πBUSS: A Parallel BEAST/BEAGLE Utility for Sequence Simulation under Complex Evolutionary Scenarios’, BMC Bioinformatics, 15: 133. - PMC - PubMed
    1. Bouckaert R. et al. (2014) ‘BEAST 2: A Software Platform for Bayesian Evolutionary Analysis’, PLoS Computational Biology, 10: e1003537. - PMC - PubMed