Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2004 Sep 14;101(37):13536-41.
doi: 10.1073/pnas.0403844101. Epub 2004 Sep 3.

Phylogenetic discovery bias in Bacillus anthracis using single-nucleotide polymorphisms from whole-genome sequencing

Affiliations

Phylogenetic discovery bias in Bacillus anthracis using single-nucleotide polymorphisms from whole-genome sequencing

Talima Pearson et al. Proc Natl Acad Sci U S A. .

Abstract

Phylogenetic reconstruction using molecular data is often subject to homoplasy, leading to inaccurate conclusions about phylogenetic relationships among operational taxonomic units. Compared with other molecular markers, single-nucleotide polymorphisms (SNPs) exhibit extremely low mutation rates, making them rare in recently emerged pathogens, but they are less prone to homoplasy and thus extremely valuable for phylogenetic analyses. Despite their phylogenetic potential, ascertainment bias occurs when SNP characters are discovered through biased taxonomic sampling; by using whole-genome comparisons of five diverse strains of Bacillus anthracis to facilitate SNP discovery, we show that only polymorphisms lying along the evolutionary pathway between reference strains will be observed. We illustrate this in theoretical and simulated data sets in which complex phylogenetic topologies are reduced to linear evolutionary models. Using a set of 990 SNP markers, we also show how divergent branches in our topologies collapse to single points but provide accurate information on internodal distances and points of origin for ancestral clades. These data allowed us to determine the ancestral root of B. anthracis, showing that it lies closer to a newly described "C" branch than to either of two previously described "A" or "B" branches. In addition, subclade rooting of the C branch revealed unequal evolutionary rates that seem to be correlated with ecological parameters and strain attributes. Our use of nonhomoplastic whole-genome SNP characters allows branch points and clade membership to be estimated with great precision, providing greater insight into epidemiological, ecological, and forensic questions.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Evolutionary model showing the consequences of biased character discovery for nonhomoplastic molecular markers. (Left) The “true” path structure of OTUs A–F is shown. (A) When OTUs A and F are used for comparative character (i.e., SNPs) discovery, only mutations on the connecting evolutionary path (red) will be discovered, resulting in the disappearance of all secondary branches but showing accurate node positions of all other OTUs. (B) Similarly, if C and E are used for character discovery, only mutations on the connecting path will be discovered, causing A and B to collapse at a single point. Again, accurate node positions are retained.
Fig. 2.
Fig. 2.
Unrooted phylogenetic tree of 26 diverse B. anthracis strains based on variation at 15 VNTR loci (M.N.V.E., unpublished data). No near relatives are used because the VNTR primers are specific to B. anthracis. The homoplasy index = 0.3721, and branch lengths are proportional to VNTR mutational steps. Numbers refer to different strains that are grouped by color into major clade designations (for more strain information, see Table 1, which is published as supporting information on the PNAS web site). Reference strains used for whole-genome comparisons are indicated by a yellow circle. Strain 1, branch C (A1055); strain 2, Kruger B1; strain 5, CNEVA 9066 B2; strain 16, North America 1; strain 26, Ames.
Fig. 3.
Fig. 3.
Binary modeling of a four-taxon phylogeny. Conversion of a multiple character-state (but unbiased) phylogeny involves the following steps. First, converting a phylogenetic tree into binary data requires that the branch lengths be known (A). Next, a “1” is assigned to a terminal OTU, and a “0” is assigned to all other OTUs. Data then are created for multiple identical “markers” corresponding to branch length in the unbiased phylogeny. For example, the branch length leading to OTU “A” is 2, therefore data from two markers should assign a “1” to the terminal OTU and a “0” to all others. For internal branches that lead to multiple OTUs, one character state was assigned to each OTU within the group and the other state to each OTU outside the group. Once again, the number of markers created should correspond to branch lengths (B). To simulate biased character discovery (i.e., between OTUs A and D), (C) markers with character states that show no difference between reference strains (circled) would not be discovered and thus were deleted (highlighted in red) (D) before phylogenetic analysis. Phylogenetic reconstruction on simulated and, subsequently, bias-sorted data sets shows branch collapse caused by biased character discovery (E).
Fig. 4.
Fig. 4.
Consequences of biased character discovery on a model phylogeny of B. anthracis with no homoplasy (see Table 2). Phylogenetic results using strains “2” and “26” for discovery (A) and “16” and “26” as discovery strains (B) exhibit the expected collapse of secondary branching and the retention of node positions (Right). (Left) The original tree is shown, with the discovery pathways designated in red. Reference strains are denoted with a yellow circle, and major clade designations are indicated with color as for Fig. 2.
Fig. 5.
Fig. 5.
Five diverse strains selected for whole-genome sequencing (yellow circles). (A) Phylogenetic locations of these five strains are shown with 21 other diverse B. anthracis strains (see Fig. 2). (B) High-quality sequences of four strains were compared to strain 26, the Ames strain. (C) The number of SNPs detected from these comparisons and the number of SNPs shared between comparisons (circled) are shown and can be used to accurately estimate evolutionary distances and phylogenetic topology. The approximate position of the phylogenetic root (from Fig. 6E) is denoted with a red circle.
Fig. 6.
Fig. 6.
Four linear phylogenies and a composite dendrogram. Phylogenetic reconstructions using SNP loci discovered between strains 2 and 26 (A), 5 and 26 (B), 1 and 26 (C), and 16 and 26 (D) are shown. As predicted, only characters lying on the connecting evolutionary pathway between reference strains (Inset; denoted by yellow circles) were discovered. (E) A combined tree made by merging the four previous trees and standardizing branch lengths by using weighted average lengths of shared branches (branch lengths were weighted by the percentage of SNPs assayed out of the total number of SNPs discovered). Each phylogenetic tree is rooted with the outgroup (OTUs 27–29). Note the consistent grouping of strains and relative internodal distances across trees. Only one character in this study was homoplastic with one character state that was incompatible with all other loci.

References

    1. Alland, D., Whittam, T. S., Murray, M. B., Cave, M. D., Hazbon, M. H., Dix, K., Kokoris, M., Duesterhoeft, A., Eisen, J. A., Fraser, C. M., et al. (2003) J. Bacteriol. 185, 3392–3399. - PMC - PubMed
    1. Harrell, L., Andersen, G. & Wilson, K. (1995) J. Clin. Microbiol. 33, 1847–1850. - PMC - PubMed
    1. Keim, P., Kalif, A., Schupp, J., Hill, K., Travis, S., Richmond, K., Adair, D., Hugh-Jones, M., Kuske, C. & Jackson, P. (1997) J. Bacteriol. 179, 818–824. - PMC - PubMed
    1. Keim, P., Price, L. B., Klevytska, A. M., Smith, K. L., Schupp, J. M., Okinaka, R., Jackson, P. J. & Hugh-Jones, M. E. (2000) J. Bacteriol. 182, 2928–2936. - PMC - PubMed
    1. Read, T. D., Salzberg, S. L., Pop, M., Shumway, M., Umayam, L., Jiang, L., Holtzapple, E., Busch, J. D., Smith, K. L., Schupp, J. M., et al. (2002) Science 296, 2028–2033. - PubMed

Publication types

LinkOut - more resources