Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2007 Mar;175(3):1251-66.
doi: 10.1534/genetics.106.063305. Epub 2006 Dec 6.

Inference of bacterial microevolution using multilocus sequence data

Affiliations
Comparative Study

Inference of bacterial microevolution using multilocus sequence data

Xavier Didelot et al. Genetics. 2007 Mar.

Abstract

We describe a model-based method for using multilocus sequence data to infer the clonal relationships of bacteria and the chromosomal position of homologous recombination events that disrupt a clonal pattern of inheritance. The key assumption of our model is that recombination events introduce a constant rate of substitutions to a contiguous region of sequence. The method is applicable both to multilocus sequence typing (MLST) data from a few loci and to alignments of multiple bacterial genomes. It can be used to decide whether a subset of isolates share common ancestry, to estimate the age of the common ancestor, and hence to address a variety of epidemiological and ecological questions that hinge on the pattern of bacterial spread. It should also be useful in associating particular genetic events with the changes in phenotype that they cause. We show that the model outperforms existing methods of subdividing recombinogenic bacteria using MLST data and provide examples from Salmonella and Bacillus. The software used in this article, ClonalFrame, is available from http://bacteria.stats.ox.ac.uk/.

PubMed Disclaimer

Figures

F<sc>igure</sc> 1.—
Figure 1.—
Illustration of the model. Two blocks (horizontal lines) evolve by point mutation (black crosses) and recombination from an unmodeled origin (colored arrows, inducing the substitutions marked by colored crosses). formula image corresponds to the observed sequences and formula image corresponds to the sequences at internal nodes.
F<sc>igure</sc> 2.—
Figure 2.—
Application to whole genomes of Salmonella enterica serovar Typhimurium. (A) A neighbor-joining tree; (B) a UPGMA tree; (C) a majority-rule consensus tree based on the output of BEAST (Drummond and Rambaut 2003); (D) a majority-rule consensus tree based on the posterior distribution of genealogies inferred by our method. Black numbers above each branch indicate observed/expected numbers of mutations, while red numbers below the branch indicate the equivalent values for recombination events followed by the total number of substitutions caused. The scale is the same for all three trees and is proportional to the expected number of mutations in each branch in D given the inferred values of θ and 𝒯. (E) Highlights the events on each branch of D. Each row represents 300,000 bp, with recombined regions in red and point mutations in green. (F) Three regions containing imports, with crosses indicating substitutions and the red line indicating probability for each nucleotide to have recombined. The location of the beginning and end of each region is indicated in kilobase pairs.
F<sc>igure</sc> 3.—
Figure 3.—
Application to simulated MLST data. (A) The true clonal genealogy; (B) the genealogy inferred by our program; (C) a UPGMA tree. The genealogies were drawn using the radial tree option of Mega (Kumar et al. 2001) and share a common scale. STs of internal nodes are indicated in italics in A and B, with x indicating an ST that does not occur in the sample. STs are indicated in regular type, with the size of the font approximately proportional to the number of strains represented. (D) The output of eBURST; (E) a network representation of our output using Graphviz (Gansner and North 2000). The network shows inferred ancestral nodes in black and the location of isolates in red, with each red line indicating a single isolate. Each isolate has the genotype of the node it is connected to, unless otherwise indicated. Nodes whose ST is not found among isolates are shown as an empty circle. The ancestral node of each network component is indicated by a thicker circle.
F<sc>igure</sc> 4.—
Figure 4.—
Application to MLST data of Bacillus. (A) The output of our program with R fixed at 0; (B) the output of our program with R inferred; (C) the eBURST output. Each row of D corresponds to the inferred events on a branch of B as labeled. The columns correspond to the seven housekeeping gene fragments of the Bacillus MLST scheme. Black crosses indicate inferred substitutions with the intensity proportional to its probability and the height of the red lines represents the inferred probability for recombination on a scale from 0 to 1.

References

    1. Bryant, D., 1997. Hunting for trees, building trees and comparing trees: theory and method in phylogenetic analysis. Ph.D. Thesis, Department of Mathematics, University of Canterbury, Christchurch, New Zealand.
    1. Darling, A. C., B. Mau, F. R. Blattner and N. T. Perna, 2004. Mauve: multiple alignment of conserved genomic sequence with rearrangements. Genome Res. 14: 1394–1403. - PMC - PubMed
    1. Didelot, X., M. Achtman, J. Parkhill, N. R. Thomson and D. Falush, 2007. A bimodal pattern of relatedness between the Salmonella paratyphi A and typhi genomes: Convergence or divergence by homologous recombination? Genome Res. 17: 61–68. - PMC - PubMed
    1. Dingle, K. E., F. M. Colles, D. Falush and M. C. Maiden, 2005. Sequence typing and comparison of population biology of Campylobacter coli and Campylobacter jejuni. J. Clin. Microbiol. 43: 340–347. - PMC - PubMed
    1. Donnelly, P., and S. Tavaré, 1995. Coalescents and genealogical structure under neutrality. Annu. Rev. Genet. 29: 401–421. - PubMed

Publication types