Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Oct 3:3:2837.
doi: 10.1038/srep02837.

Empirical validation of viral quasispecies assembly algorithms: state-of-the-art and challenges

Affiliations

Empirical validation of viral quasispecies assembly algorithms: state-of-the-art and challenges

Mattia C F Prosperi et al. Sci Rep. .

Abstract

Next generation sequencing (NGS) is superseding Sanger technology for analysing intra-host viral populations, in terms of genome length and resolution. We introduce two new empirical validation data sets and test the available viral population assembly software. Two intra-host viral population 'quasispecies' samples (type-1 human immunodeficiency and hepatitis C virus) were Sanger-sequenced, and plasmid clone mixtures at controlled proportions were shotgun-sequenced using Roche's 454 sequencing platform. The performance of different assemblers was compared in terms of phylogenetic clustering and recombination with the Sanger clones. Phylogenetic clustering showed that all assemblers captured a proportion of the most divergent lineages, but none were able to provide a high precision/recall tradeoff. Estimated variant frequencies mildly correlated with the original. Given the limitations of currently available algorithms identified by our empirical validation, the development and exploitation of additional data sets is needed, in order to establish an efficient framework for viral population reconstruction using NGS.

PubMed Disclaimer

Figures

Figure 1
Figure 1. Phylogenetic trees (upper panels) and networks (lower panels) of the original Sanger sequences for the HCV (left panels) and HIV-1 (right panels) data sets.
Neighbor-joining and neighbor-net algorithms were run on optimized models of evolution, over 500 bootstrap runs. Node labels show bootstrap percentages. Numbers after the labels represent variant prevalence (%).
Figure 2
Figure 2. Evolutionary history inferred by neighbor-joining, using an optimized nucleotide substitution model, that compares HCV variants reconstructed by each quasispecies assembler with the original Sanger clones; trees are rooted using the mapping reference sequence.
Panels (a), (b), (c), and (d) show Geneious™ de novo, PredictHaplo, QuRe and ShoRAH, respectively. Node numbers represent% bootstrap replicates (of 500) ≥ 75%. Bullets represent variants at a frequency ≥ 5%, and triangles those < 5% (not available for PredictHaplo, shown by squares). Blue color indicates Sanger isolates, and red reconstructed variants.
Figure 3
Figure 3. Evolutionary history inferred by neighbor-joining, using an optimized nucleotide substitution model, that compares HIV-1 variants reconstructed by each quasispecies assembler with the original Sanger clones; trees are rooted using the mapping reference sequence.
Panels (a), (b), (c), and (d) show Geneious™ de novo, PredictHaplo, QuRe and ShoRAH, respectively. Node numbers represent% bootstrap replicates (of 500) ≥ 75%. Bullets represent variants at a frequency ≥ 5%, and triangles those < 5% (not available for PredictHaplo, shown by squares). Blue color indicates Sanger isolates, and red reconstructed variants.
Figure 4
Figure 4. Phylogenetic trees comparing together all reconstructed variants from different assemblers with the original Sanger sequences for the HCV and HIV-1 experiments (left and right panels, respectively).
Trees have been inferred using neighbor-joining on an optimized model of evolution, rooted on the mapping reference sequence, performing 500 bootstraps (nodes with ≥ 75% bootstrap support are shown). For ease of read, only the 30 highest-frequency variants from ShoRAH have been included.

Similar articles

Cited by

References

    1. Metzker M. L. Sequencing technologies - the next generation. Nat Rev Genet 11, 31–46 (2010). - PubMed
    1. Pareek C. S., Smoczynski R. & Tretyn A. Sequencing technologies and genome sequencing. J Appl Genet 52, 413–435 (2011). - PMC - PubMed
    1. Archer J. et al. The evolutionary analysis of emerging low frequency HIV-1 CXCR4 using variants through time--an ultra-deep approach. PLoS Comput Biol 6, e1001022 (2010). - PMC - PubMed
    1. Kuroda M. et al. Characterization of quasispecies of pandemic 2009 influenza A virus (A/H1N1/2009) by de novo sequencing using a next-generation DNA sequencer. PLoS One 5, e10256 (2010). - PMC - PubMed
    1. Poon A. F. et al. Phylogenetic analysis of population-based and deep sequencing data to identify coevolving sites in the nef gene of HIV-1. Mol Biol Evol 27, 819–832 (2010). - PMC - PubMed

Publication types

LinkOut - more resources