De novo assembly of viral quasispecies using overlap graphs
- PMID: 28396522
- PMCID: PMC5411778
- DOI: 10.1101/gr.215038.116
De novo assembly of viral quasispecies using overlap graphs
Abstract
A viral quasispecies, the ensemble of viral strains populating an infected person, can be highly diverse. For optimal assessment of virulence, pathogenesis, and therapy selection, determining the haplotypes of the individual strains can play a key role. As many viruses are subject to high mutation and recombination rates, high-quality reference genomes are often not available at the time of a new disease outbreak. We present SAVAGE, a computational tool for reconstructing individual haplotypes of intra-host virus strains without the need for a high-quality reference genome. SAVAGE makes use of either FM-index-based data structures or ad hoc consensus reference sequence for constructing overlap graphs from patient sample data. In this overlap graph, nodes represent reads and/or contigs, while edges reflect that two reads/contigs, based on sound statistical considerations, represent identical haplotypic sequence. Following an iterative scheme, a new overlap assembly algorithm that is based on the enumeration of statistically well-calibrated groups of reads/contigs then efficiently reconstructs the individual haplotypes from this overlap graph. In benchmark experiments on simulated and on real deep-coverage data, SAVAGE drastically outperforms generic de novo assemblers as well as the only specialized de novo viral quasispecies assembler available so far. When run on ad hoc consensus reference sequence, SAVAGE performs very favorably in comparison with state-of-the-art reference genome-guided tools. We also apply SAVAGE on two deep-coverage samples of patients infected by the Zika and the hepatitis C virus, respectively, which sheds light on the genetic structures of the respective viral quasispecies.
© 2017 Baaijens et al.; Published by Cold Spring Harbor Laboratory Press.
Figures





Similar articles
-
Full-length de novo viral quasispecies assembly through variation graph construction.Bioinformatics. 2019 Dec 15;35(24):5086-5094. doi: 10.1093/bioinformatics/btz443. Bioinformatics. 2019. PMID: 31147688
-
Benchmarking of de novo assembly algorithms for Nanopore data reveals optimal performance of OLC approaches.BMC Genomics. 2016 Aug 22;17 Suppl 7(Suppl 7):507. doi: 10.1186/s12864-016-2895-8. BMC Genomics. 2016. PMID: 27556636 Free PMC article.
-
Viral quasispecies assembly via maximal clique enumeration.PLoS Comput Biol. 2014 Mar 27;10(3):e1003515. doi: 10.1371/journal.pcbi.1003515. eCollection 2014 Mar. PLoS Comput Biol. 2014. PMID: 24675810 Free PMC article.
-
ViQUF: De Novo Viral Quasispecies Reconstruction Using Unitig-Based Flow Networks.IEEE/ACM Trans Comput Biol Bioinform. 2023 Mar-Apr;20(2):1550-1562. doi: 10.1109/TCBB.2022.3190282. Epub 2023 Apr 3. IEEE/ACM Trans Comput Biol Bioinform. 2023. PMID: 35853050
-
Strainline: full-length de novo viral haplotype reconstruction from noisy long reads.Genome Biol. 2022 Jan 20;23(1):29. doi: 10.1186/s13059-021-02587-6. Genome Biol. 2022. PMID: 35057847 Free PMC article.
Cited by
-
Benchmarking State-of-the-Art Approaches for Norovirus Genome Assembly in Metagenome Sample.Biology (Basel). 2023 Jul 29;12(8):1066. doi: 10.3390/biology12081066. Biology (Basel). 2023. PMID: 37626951 Free PMC article.
-
Strainberry: automated strain separation in low-complexity metagenomes using long reads.Nat Commun. 2021 Jul 23;12(1):4485. doi: 10.1038/s41467-021-24515-9. Nat Commun. 2021. PMID: 34301928 Free PMC article.
-
From Alpha to Zeta: Identifying Variants and Subtypes of SARS-CoV-2 Via Clustering.J Comput Biol. 2021 Nov;28(11):1113-1129. doi: 10.1089/cmb.2021.0302. Epub 2021 Oct 25. J Comput Biol. 2021. PMID: 34698508 Free PMC article.
-
Evaluating assembly and variant calling software for strain-resolved analysis of large DNA viruses.Brief Bioinform. 2021 May 20;22(3):bbaa123. doi: 10.1093/bib/bbaa123. Brief Bioinform. 2021. PMID: 34020538 Free PMC article.
-
phasebook: haplotype-aware de novo assembly of diploid genomes from long reads.Genome Biol. 2021 Oct 27;22(1):299. doi: 10.1186/s13059-021-02512-x. Genome Biol. 2021. PMID: 34706745 Free PMC article.
References
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources
Research Materials