Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2012;8(6):e1002573.
doi: 10.1371/journal.pcbi.1002573. Epub 2012 Jun 28.

A model-based Bayesian estimation of the rate of evolution of VNTR loci in Mycobacterium tuberculosis

Affiliations
Comparative Study

A model-based Bayesian estimation of the rate of evolution of VNTR loci in Mycobacterium tuberculosis

R Zachariah Aandahl et al. PLoS Comput Biol. 2012.

Abstract

Variable numbers of tandem repeats (VNTR) typing is widely used for studying the bacterial cause of tuberculosis. Knowledge of the rate of mutation of VNTR loci facilitates the study of the evolution and epidemiology of Mycobacterium tuberculosis. Previous studies have applied population genetic models to estimate the mutation rate, leading to estimates varying widely from around 10⁻⁵ to 10⁻² per locus per year. Resolving this issue using more detailed models and statistical methods would lead to improved inference in the molecular epidemiology of tuberculosis. Here, we use a model-based approach that incorporates two alternative forms of a stepwise mutation process for VNTR evolution within an epidemiological model of disease transmission. Using this model in a Bayesian framework we estimate the mutation rate of VNTR in M. tuberculosis from four published data sets of VNTR profiles from Albania, Iran, Morocco and Venezuela. In the first variant, the mutation rate increases linearly with respect to repeat numbers (linear model); in the second, the mutation rate is constant across repeat numbers (constant model). We find that under the constant model, the mean mutation rate per locus is 10⁻²·⁰⁶ (95% CI: 10⁻²·⁶¹,10⁻¹·⁵⁸)and under the linear model, the mean mutation rate per locus per repeat unit is 10⁻²·⁴⁵ (95% CI: 10⁻³·⁰⁷,10⁻¹·⁹⁴). These new estimates represent a high rate of mutation at VNTR loci compared to previous estimates. To compare the two models we use posterior predictive checks to ascertain which of the two models is better able to reproduce the observed data. From this procedure we find that the linear model performs better than the constant model. The general framework we use allows the possibility of extending the analysis to more complex models in the future.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Marginal posterior distributions for and using simulated data.
Plots show the marginal posterior distribution of formula image (left) and formula image (right) using four simulated data sets generated from the constant (left) and linear (right) VNTR models. The known values of formula image and formula image used to generate the data, formula image and formula image, are indicated by vertical lines.
Figure 2
Figure 2. Genetic diversity of VNTR loci for each published dataset.
Left plots: Empirical cumulative distribution function of gene diversity across loci. The gene diversity is computed at each locus as formula image where formula image is the number of isolates with repeat size formula image at locus formula image. Right plots: Heat-map diversity, following Aminian et al (2009), illustrating the proportion of tandem repeats for each locus (ordered according to the original study).
Figure 3
Figure 3. Marginal posterior estimates for , and .
Here formula image is the per-locus mutation rate for a locus with a single repeat under the linear model; formula image is the same quantity scaled by the mean number of repeats observed in the sample; formula image is the per-locus mutation rate for any repeat number under the constant model.
Figure 4
Figure 4. Posterior predictive model checks.
Scatterplots of the posterior predictive distributions of formula image (the difference between maximum and minimum range of repeat numbers over loci), versus formula image (the same quantity substituting variance for range). Columns represent constant (left) and linear (right) models. Rows represent the Albanian dataset (top), artificially generated data from the constant model (middle) and artificially generated data from the linear model (bottom). The formula image indicates the statistics derived from the observed dataset.
Figure 5
Figure 5. Further posterior predictive model checks.
Scatterplots of the posterior predictive distributions of formula image (the maximum range of repeat numbers over loci) versus formula image (the intercept at one repeat) under the linear model, for each observed dataset. The formula image indicates the statistics derived from the observed dataset.

References

    1. World Health Organization. Global tuberculosis control 2010. World Health Organization; 2010.
    1. Hershberg R, Lipatov M, Small PM, Sheffer H, Niemann S, et al. High functional diversity in Mycobacterium tuberculosis driven by genetic drift and human demography. PLoS Biol. 2008;6:e311. - PMC - PubMed
    1. Wirth T, Hildebrand F, Allix-Béguec C, Wölbeling F, Kubica T, et al. Origin, spread and demography of the Mycobacterium tuberculosis complex. PLoS Pathog. 2008;4:e1000160. - PMC - PubMed
    1. Pepperell C, Hoeppner V, Lipatov M, Wobeser W, Schoolnik GK, et al. Bacterial genetic signatures of human social phenomena among M. tuberculosis from an Aboriginal Canadian population. Mol Biol Evol. 2010;27:427–440. - PMC - PubMed
    1. Pepperell CS, Granka JM, Alexander DC, Behr MA, Chui L, et al. Dispersal of Mycobacterium tuberculosis via the Canadian fur trade. Proc Natl Acad Sci U S A. 2011;108:6526–6531. - PMC - PubMed

Publication types