Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Feb 7;9(2):e87655.
doi: 10.1371/journal.pone.0087655. eCollection 2014.

A framework including recombination for analyzing the dynamics of within-host HIV genetic diversity

Affiliations

A framework including recombination for analyzing the dynamics of within-host HIV genetic diversity

Ori Sargsyan. PLoS One. .

Abstract

This paper presents a novel population genetic model and a computationally and statistically tractable framework for analyzing within-host HIV diversity based on serial samples of HIV DNA sequences. This model considers within-host HIV evolution during the chronic phase of infection and assumes that the HIV population is homogeneous at the beginning, corresponding to the time of seroconversion, and evolves according to the Wright-Fisher reproduction model with recombination and variable mutation rate across nucleotide sites. In addition, the population size and generation time vary over time as piecewise constant functions of time. Under this model I approximate the genealogical and mutational processes for serial samples of DNA sequences by a continuous coalescent-recombination process and an inhomogeneous Poisson process, respectively. Based on these derivations, an efficient algorithm is described for generating polymorphisms in serial samples of DNA sequences under the model including various substitution models. Extensions of the algorithm are also described for other demographic scenarios that can be more suitable for analyzing the dynamics of genetic diversity of other pathogens in vitro and in vivo. For the case of the infinite-sites model, I derive analytical formulas for the expected number of polymorphic sites in sample of DNA sequences, and apply the developed simulation and analytical methods to explore the fit of the model to HIV genetic diversity based on serial samples of HIV DNA sequences from 9 HIV-infected individuals. The results particularly show that the estimates of the ratio of recombination rate over mutation rate can vary over time between very high and low values, which can be considered as a consequence of the impact of selection forces.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The author has declared that no competing interests exist.

Figures

Figure 1
Figure 1. The observed and expected numbers of polymorphic sites and average pairwise differences in serial samples.
The horizontal axis of each panel indicates sampling time since seroconversion. (A) The observed numbers of polymorphic sites, formula image and average numbers of pairwise differences, formula image in serial samples are plotted with respect to the sampling times; the data points determined by the two statistics are connected by blue and red lines, respectively. (B) and (C) show the expected numbers of polymorphic sites in serial samples under the Wright-Fisher model with constant population size combined with the finite-sites Jukes-Cantor model, as well as with the infinite-sites model, respectively. Under these substitution models, the expected values of this statistic for sample size formula image at time formula image are denoted by formula image and formula image respectively. The expected average numbers of pairwise differences for the serial samples in each individual's case are not shown since they are the same for the samples.
Figure 2
Figure 2. Normalized observed and expected values of the two summary statistics.
(A) The dynamics of the observed values of the two statistics in serial samples (Figure 1) are normalized based on transformation (1) and denoted by formula image and formula image respectively. (B) and (C) show the normalized values of the expected numbers (Figure 1) of polymorphic sites in serial samples under the finite-site Jukes-Cantor model and the infinite-sites model denoted by formula image and formula image respectively. The normalized values of the expected average numbers of pairwise differences in serial samples are equal to 0 and are not plotted.
Figure 3
Figure 3. The fit of the model to the data in the finite-sites model case.
In this case the population genetic model is fitted to the data by matching the observed values of the numbers of polymorphic sites, formula image and divergences, formula image in the serial samples to their expected values, denoted by formula image formula image and formula image formula image respectively. (A) shows the observed and fitted (expected) values of the numbers of polymorphic sites in serial samples. The observed and expected data points are connected by red and blue lines, respectively. (B) shows the observed and fitted (expected) values of the divergences in serial samples. Based on this fitting the vectors formula image and formula image are estimated, and for the fitted model the predicted (expected) values of the average numbers of pairwise differences in serial samples are computed. (C) shows the observed and predicted values of this statistic in the serial samples, and the statistics are denoted by formula image and formula image formula image respectively.
Figure 4
Figure 4. Observed and expected average numbers of pairwise differences between sequences at different sampling time points.
Average number of pairwise difference between sequences in samples taken at times formula image and formula image are denoted by formula image The observed and expected values of this statistic under the infinite-sites model and the finite-sites model are denoted by formula image formula image formula image and formula image formula image respectively. (A) shows the observed values of formula image in the serial samples for each individual's case. (B) and (C) show the predicted (expected) values of formula image in the serial samples for each individual's case computed receptively under the fitted models for the cases of the infinite-sites and finite-sites models.
Figure 5
Figure 5. The fit of the model to the data in the infinite-sites model case.
In this case the population genetic model is fitted to the data by matching the observed values of the numbers of polymorphic sites, formula image and divergences, formula image in the serial samples to their expected values, denoted by formula image formula image and formula image formula image respectively. (A) shows the observed and fitted (expected) values of the numbers of polymorphic sites in serial samples. The observed and expected data points are connected by red and blue lines, respectively. (B) shows the observed and fitted (expected) values of the divergences in serial samples. Based on this fitting the vectors formula image and formula image are estimated, and for the fitted model the predicted (expected) values of the average numbers of pairwise differences in serial samples are computed. (C) shows the observed and predicted values of this statistic in the serial samples and are denoted by formula image and formula image formula image respectively.
Figure 6
Figure 6. The dynamics of and for the serial samples in each individual's case.
For the serial samples from each of the individuals, the observed values of formula image and formula image as well as their expected values are plotted with respect to the sampling times. The observed data points are connected by red lines. For each of the values of formula image equal to 0, 1, 10, 50, 100, and 200, the expected values of these two statistics are computed by using Monte Carlo approach and Algorithm 1 based on the estimated values of the vectors formula image and formula image for the finite-sites case.
Figure 7
Figure 7. The 95% probability intervals for and in the case of individual Pt1.
The observed values of the statistics formula image and formula image at sampling time points are connected by red lines. For each of the values of formula image and at each sampling time point the 95% probability interval is inferred by estimating 2.5% and 97.5% quantiles of the statistics under the estimated model in the finite-sites case. The vertical intervals at the sampling time points represent the 95% probability intervals in green, black, and orange when formula image is 0, 1, or 200, respectively. In each case the same colors are respectively used to connect the expected values of the statistics.

Similar articles

Cited by

References

    1. Lemey P, Rambaut A, Pybus OG (2006) HIV evolutionary dynamics within and among hosts. AIDS Rev 8: 125–140. - PubMed
    1. Burke DS (1997) Recombination in HIV: an important viral evolutionary strategy. Emerg Infect Dis 3: 253–259. - PMC - PubMed
    1. Rodrigo AG, Shpaer EG, Delwart EL, Iversen AK, Gallo MV, et al. (1999) Coa- lescent estimates of HIV-1 generation time in vivo. Proc Natl Acad Sci USA 96: 2187–2191. - PMC - PubMed
    1. Fu YX (2001) Estimating mutation rate and generation time from longitudinal samples of DNA sequences. Mol Biol Evol 18: 620–626. - PubMed
    1. Seo TK, Thorne JL, Hasegawa M, Kishino H (2002) Estimation of effective popula- tion size of HIV-1 within a host: a pseudomaximum-likelihood approach. Genetics 160: 1283–1293. - PMC - PubMed

Publication types