Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Jan 18;20(1):15-25.
doi: 10.1093/bib/bbx079.

VirGenA: a reference-based assembler for variable viral genomes

Affiliations

VirGenA: a reference-based assembler for variable viral genomes

Gennady G Fedonin et al. Brief Bioinform. .

Abstract

Characterization of the within-host genetic diversity of viral pathogens is required for selection of effective treatment of some important viral infections, e.g. HIV, HBV and HCV. Despite the technical ability of detection, there are conflicting data regarding the clinical significance of low-frequency variants, partially because of the difficulty of their distinguishing from experimental artifacts. The issue of cross-contamination is relevant for all highly sensitive techniques, including deep sequencing: even trace contamination leads to a significant increase of false positives in identified SNVs. Determination of infections by multiple genotypes of some viruses, the incidence of which can be considerable, especially in risk groups, is also clinically significant in some cases. We developed a new viral reference-guided assembler, VirGenA, that can separate mixtures of strains of different intraspecies genetic groups (genotypes, subtypes, clades, etc.) and assemble a separate consensus sequence for each group in a mixture. It produced long assemblies for mixture components of extremely low frequencies (<1%) allowing detection of cross-contamination of samples by divergent genotypes. We tested VirGenA on both clinical and simulated data. On both types of data, VirGenA shows better or similar results than the existing de novo assemblers. Cross-platform implementation (including source code) is freely available at https://github.com/gFedonin/VirGenA/releases.

PubMed Disclaimer

Figures

Figure 1
Figure 1
SNV interpretation problems caused by mapping reads on the major consensus sequence. Major consensus sequence corresponds to the dominant genetic subpopulation of strains in a sample. Ignoring the presence of divergent minor subpopulations seriously complicates SNV interpretation: some SNVs that correspond to differences between subpopulations (genotyping SNV) are shown by ‘*’, and artifact SNVs caused by mapping of reads on divergent sequence are shown by arrows in positions 8383, 8385, 8391 and 8394. Mutation G→A in conservative position 8380 results in artificial SNV in position 8391; in other positions, artifacts are caused by uncertainty of reads’ alignments around tandem repeat ‘CCAGCAGCAGAG’. A colour version of this figure is available at BIB online: https://academic.oup.com/bib.
Figure 2
Figure 2
Comparison of assemblers on clinical dataset. For each sample a minimal reference set was constructed; all contigs were assigned to respective references from the minimal set. The same references for different samples were considered separately. (a) Lengths of longest contigs among all contigs assigned to each reference. Y-axis shows a total number of references in union of minimal reference sets of all samples. (b) Coverages of references each of which had maximal average coverage by contigs assembled by all methods for each sample in the dataset. (c) Average identities between all pairs of assemblies of each sample produced by different methods. (d) Enlarged scale for high values of the diagram shown on the panel (c). A colour version of this figure is available at BIB online: https://academic.oup.com/bib.
Figure 3
Figure 3
Comparison of assemblers on artificial two-component mixtures of divergent HIV strains. Lengths of longest contigs among all contigs assigned to each reference for each sample from data sets: PIRS_min (A), PIRS_median (C) and MIX (E). Identities of assemblies of components to corresponding references for each sample from data sets: PIRS_min (B), PIRS_median (D) and MIX (F). On panels A, C and E, Y-axes show total numbers of components in all samples together. On panels B, D and F, Y-axes show total numbers of components for all samples together normalized to one, which have nonzero coverage by contigs assembled by each software. A colour version of this figure is available at BIB online: https://academic.oup.com/bib.

Similar articles

Cited by

References

    1. Department of Health and Human Services. Panel on antiretroviral guidelines for adults and adolescents Guidelines for the Use of Antiretroviral Agents in HIV-1-infected Adults and Adolescents. Department of Health and Human Services; http://www.aidsinfo.nih.gov/ContentFiles/AdultandAdolescentGL.pdf. 2016.
    1. Terrault NA, Bzowej NH, Chang K-M. AASLD guidelines for treatment of chronic hepatitis B. Hepatology 2016;63:261–83. - PMC - PubMed
    1. AASLD/IDSA HCV Guidance Panel. Hepatitis C guidance: AASLD-IDSA recommendations for testing, managing, and treating adults infected with hepatitis C virus: HEPATITIS C VIRUS GUIDANCE PANEL. Hepatology 2015;62:932–54. - PubMed
    1. Li JZ, Kuritzkes DR.. Clinical implications of HIV-1 minority variants. Clin Infect. Dis 2013;56:1667–74. - PMC - PubMed
    1. Li JZ, Paredes R, Ribaudo HJ. Relationship between minority nonnucleoside reverse transcriptase inhibitor resistance mutations, adherence, and the risk of virologic failure. AIDS 2012;26:185–92. - PMC - PubMed

Publication types