Accurate viral population assembly from ultra-deep sequencing data
- PMID: 24932001
- PMCID: PMC4058922
- DOI: 10.1093/bioinformatics/btu295
Accurate viral population assembly from ultra-deep sequencing data
Abstract
Motivation: Next-generation sequencing technologies sequence viruses with ultra-deep coverage, thus promising to revolutionize our understanding of the underlying diversity of viral populations. While the sequencing coverage is high enough that even rare viral variants are sequenced, the presence of sequencing errors makes it difficult to distinguish between rare variants and sequencing errors.
Results: In this article, we present a method to overcome the limitations of sequencing technologies and assemble a diverse viral population that allows for the detection of previously undiscovered rare variants. The proposed method consists of a high-fidelity sequencing protocol and an accurate viral population assembly method, referred to as Viral Genome Assembler (VGA). The proposed protocol is able to eliminate sequencing errors by using individual barcodes attached to the sequencing fragments. Highly accurate data in combination with deep coverage allow VGA to assemble rare variants. VGA uses an expectation-maximization algorithm to estimate abundances of the assembled viral variants in the population. RESULTS on both synthetic and real datasets show that our method is able to accurately assemble an HIV viral population and detect rare variants previously undetectable due to sequencing errors. VGA outperforms state-of-the-art methods for genome-wide viral assembly. Furthermore, our method is the first viral assembly method that scales to millions of sequencing reads.
Availability: Our tool VGA is freely available at http://genetics.cs.ucla.edu/vga/
© The Author 2014. Published by Oxford University Press.
Figures







Similar articles
-
Viral deep sequencing needs an adaptive approach: IRMA, the iterative refinement meta-assembler.BMC Genomics. 2016 Sep 5;17(1):708. doi: 10.1186/s12864-016-3030-6. BMC Genomics. 2016. PMID: 27595578 Free PMC article.
-
Inferring viral quasispecies spectra from 454 pyrosequencing reads.BMC Bioinformatics. 2011;12 Suppl 6(Suppl 6):S1. doi: 10.1186/1471-2105-12-S6-S1. Epub 2011 Jul 28. BMC Bioinformatics. 2011. PMID: 21989211 Free PMC article.
-
Ray: simultaneous assembly of reads from a mix of high-throughput sequencing technologies.J Comput Biol. 2010 Nov;17(11):1519-33. doi: 10.1089/cmb.2009.0238. Epub 2010 Oct 20. J Comput Biol. 2010. PMID: 20958248 Free PMC article.
-
De novo meta-assembly of ultra-deep sequencing data.Bioinformatics. 2015 Jun 15;31(12):i9-16. doi: 10.1093/bioinformatics/btv226. Bioinformatics. 2015. PMID: 26072514 Free PMC article.
-
aBayesQR: A Bayesian Method for Reconstruction of Viral Populations Characterized by Low Diversity.J Comput Biol. 2018 Jul;25(7):637-648. doi: 10.1089/cmb.2017.0249. Epub 2018 Feb 26. J Comput Biol. 2018. PMID: 29480740
Cited by
-
Streamlined Subpopulation, Subtype, and Recombination Analysis of HIV-1 Half-Genome Sequences Generated by High-Throughput Sequencing.mSphere. 2020 Oct 14;5(5):e00551-20. doi: 10.1128/mSphere.00551-20. mSphere. 2020. PMID: 33055255 Free PMC article.
-
Viral quasispecies reconstruction via tensor factorization with successive read removal.Bioinformatics. 2018 Jul 1;34(13):i23-i31. doi: 10.1093/bioinformatics/bty291. Bioinformatics. 2018. PMID: 29949976 Free PMC article.
-
Distinguishing low frequency mutations from RT-PCR and sequence errors in viral deep sequencing data.BMC Genomics. 2015 Mar 24;16(1):229. doi: 10.1186/s12864-015-1456-x. BMC Genomics. 2015. PMID: 25886445 Free PMC article.
-
Epidemiological data analysis of viral quasispecies in the next-generation sequencing era.Brief Bioinform. 2021 Jan 18;22(1):96-108. doi: 10.1093/bib/bbaa101. Brief Bioinform. 2021. PMID: 32568371 Free PMC article. Review.
-
Mutational pathway maps and founder effects define the within-host spectrum of hepatitis C virus mutants resistant to drugs.PLoS Pathog. 2019 Apr 1;15(4):e1007701. doi: 10.1371/journal.ppat.1007701. eCollection 2019 Apr. PLoS Pathog. 2019. PMID: 30934020 Free PMC article.
References
-
- Armin, T, Beerenwinkel N. 2013 http://www.bsse.ethz.ch/cbg/software/InDelFixer.
-
- Bansal V, Bafna V. HapCUT: an efficient and accurate algorithm for the haplotype assembly problem. Bioinformatics. 2008;24:i153–i159. - PubMed
Publication types
MeSH terms
Grants and funding
- R01-GM083198/GM/NIGMS NIH HHS/United States
- R01 MH101782/MH/NIMH NIH HHS/United States
- P01- HL30568/HL/NHLBI NIH HHS/United States
- U01-DA024417/DA/NIDA NIH HHS/United States
- K25 HL080079/HL/NHLBI NIH HHS/United States
- P01 HL028481/HL/NHLBI NIH HHS/United States
- U01 DA024417/DA/NIDA NIH HHS/United States
- R01-MH101782/MH/NIMH NIH HHS/United States
- K25-HL080079/HL/NHLBI NIH HHS/United States
- R01 ES022282/ES/NIEHS NIH HHS/United States
- P01 HL030568/HL/NHLBI NIH HHS/United States
- R01 GM083198/GM/NIGMS NIH HHS/United States
- R01-ES022282/ES/NIEHS NIH HHS/United States
- P01-HL28481/HL/NHLBI NIH HHS/United States
- P30 CA016042/CA/NCI NIH HHS/United States
- P30 AI028697/AI/NIAID NIH HHS/United States
LinkOut - more resources
Full Text Sources
Other Literature Sources