Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Jul 1;34(13):i23-i31.
doi: 10.1093/bioinformatics/bty291.

Viral quasispecies reconstruction via tensor factorization with successive read removal

Affiliations

Viral quasispecies reconstruction via tensor factorization with successive read removal

Soyeon Ahn et al. Bioinformatics. .

Abstract

Motivation: As RNA viruses mutate and adapt to environmental changes, often developing resistance to anti-viral vaccines and drugs, they form an ensemble of viral strains--a viral quasispecies. While high-throughput sequencing (HTS) has enabled in-depth studies of viral quasispecies, sequencing errors and limited read lengths render the problem of reconstructing the strains and estimating their spectrum challenging. Inference of viral quasispecies is difficult due to generally non-uniform frequencies of the strains, and is further exacerbated when the genetic distances between the strains are small.

Results: This paper presents TenSQR, an algorithm that utilizes tensor factorization framework to analyze HTS data and reconstruct viral quasispecies characterized by highly uneven frequencies of its components. Fundamentally, TenSQR performs clustering with successive data removal to infer strains in a quasispecies in order from the most to the least abundant one; every time a strain is inferred, sequencing reads generated from that strain are removed from the dataset. The proposed successive strain reconstruction and data removal enables discovery of rare strains in a population and facilitates detection of deletions in such strains. Results on simulated datasets demonstrate that TenSQR can reconstruct full-length strains having widely different abundances, generally outperforming state-of-the-art methods at diversities 1-10% and detecting long deletions even in rare strains. A study on a real HIV-1 dataset demonstrates that TenSQR outperforms competing methods in experimental settings as well. Finally, we apply TenSQR to analyze a Zika virus sample and reconstruct the full-length strains it contains.

Availability and implementation: TenSQR is available at https://github.com/SoYeonA/TenSQR.

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
An illustration of the tensor factorization representation of the viral quasispecies assembly problem
Fig. 2.
Fig. 2.
Performance comparison of TenSQR, aBayesQR, ShoRAH, ViQuaS and PredictHaplo in terms of Recall, Precision, Predicted Proportion (PredProp), Reconstruction Rate (ReconRate) and JSD on the simulated data with ε=2×103 for a mixture of (a) 5 viral strains and (b) 10 viral strains. (For the plots that include error bars, please see the corresponding Supplementary Fig. S2 in Supplementary Material B)
Fig. 3.
Fig. 3.
Performance comparison of TenSQR, aBayesQR, ShoRAH, ViQuaS and PredictHaplo in terms of Recall, Precision, Predicted Proportion (PredProp), Reconstruction Rate (ReconRate) and JSD on the simulated data with ε=7×103 for a mixture of (a) 5 viral strains and (b) 10 viral strains. (For the plots that include error bars, please see the corresponding Supplementary Fig. S3 in Supplementary Material B)

Similar articles

Cited by

References

    1. Ahn S., Vikalo H. (2017) aBayesQR: a Bayesian method for reconstruction of viral populations characterized by low diversity In: International Conference on Research in Computational Molecular Biology. Springer, Hong Kong, pp. 353–369. - PubMed
    1. Astrovskaya I. et al. (2011) Inferring viral quasispecies spectra from 454 pyrosequencing reads. BMC Bioinformatics, 12, S1. - PMC - PubMed
    1. Baaijens J.A. et al. (2017) De novo assembly of viral quasispecies using overlap graphs. Genome Res., 27, 835–848. - PMC - PubMed
    1. Beerenwinkel N. et al. (2012) Challenges and opportunities in estimating viral genetic diversity from next-generation sequencing data. Front. Microbiol., 3, 329. - PMC - PubMed
    1. Cai C. et al. (2016) Structured low-rank matrix factorization for haplotype assembly. IEEE J. Selected Topics Signal Process., 10, 647–657.

Publication types