Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Oct 25;5(10):e13619.
doi: 10.1371/journal.pone.0013619.

The threshold bootstrap clustering: a new approach to find families or transmission clusters within molecular quasispecies

Affiliations

The threshold bootstrap clustering: a new approach to find families or transmission clusters within molecular quasispecies

Mattia C F Prosperi et al. PLoS One. .

Abstract

Background: Phylogenetic methods produce hierarchies of molecular species, inferring knowledge about taxonomy and evolution. However, there is not yet a consensus methodology that provides a crisp partition of taxa, desirable when considering the problem of intra/inter-patient quasispecies classification or infection transmission event identification. We introduce the threshold bootstrap clustering (TBC), a new methodology for partitioning molecular sequences, that does not require a phylogenetic tree estimation.

Methodology/principal findings: The TBC is an incremental partition algorithm, inspired by the stochastic Chinese restaurant process, and takes advantage of resampling techniques and models of sequence evolution. TBC uses as input a multiple alignment of molecular sequences and its output is a crisp partition of the taxa into an automatically determined number of clusters. By varying initial conditions, the algorithm can produce different partitions. We describe a procedure that selects a prime partition among a set of candidate ones and calculates a measure of cluster reliability. TBC was successfully tested for the identification of type-1 human immunodeficiency and hepatitis C virus subtypes, and compared with previously established methodologies. It was also evaluated in the problem of HIV-1 intra-patient quasispecies clustering, and for transmission cluster identification, using a set of sequences from patients with known transmission event histories.

Conclusion: TBC has been shown to be effective for the subtyping of HIV and HCV, and for identifying intra-patient quasispecies. To some extent, the algorithm was able also to infer clusters corresponding to events of infection transmission. The computational complexity of TBC is quadratic in the number of taxa, lower than other established methods; in addition, TBC has been enhanced with a measure of cluster reliability. The TBC can be useful to characterise molecular quasipecies in a broad context.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. The threshold bootstrap clustering (TBC) algorithm.
Figure 2
Figure 2. Pairwise distance distributions.
Distributions of pairwise distances using the LogDet estimator for the data sets (i) – (vi) analysed in this study.
Figure 3
Figure 3. Phylogeny and TBC of HIV/HCV subtypes.
Phylogenetic trees constructed for the HCV genotype (panel A) and group M HIV-1 subtype (panel B) reference sets of Los Alamos repositories (neighbour-joining, LogDet distance). Coloured branches represent clusters retrieved by the CTree algorithm, whilst circles represent clusters retrieved by the TBC algorithm.
Figure 4
Figure 4. Intra-patient phylogeny and TBC.
Bayesian phylogenetic tree for a particular patient (# 7) from the Shankarappa data set. Tree is rooted on the earliest sequence, and node labels represent posterior probabilities. Coloured tips correspond to different clusters retrieved by the TBC using a threshold of 12 (whilst black tips are singletons). X4-tropic populations are enclosed in red-boxes.

References

    1. Lemey P, Salemi M, Vandamme A-M, editors. NY: Cambridge University Press; 2009. The Phylogenetic Handbook: A Practical Approach to phylogenetic analysis and hypothesis testing.
    1. Felsenstein J. Sinauer Associates, Sunderland, MA; 2004. Inferring Phylogenies.
    1. Fitch WM, Margoliash E. Construction of phylogenetic trees. Science. 1967;155:279–84. - PubMed
    1. Hendy MD, Penny D. Branch and bound algorithms to determine minimal evolutionary trees. Math Biosci. 1982;60:133–42.
    1. Saitou N, Nei M. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol. 1987;4(4):406–425. - PubMed

Publication types