Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Aug 10;9(1):3205.
doi: 10.1038/s41467-018-05658-8.

Maximal viral information recovery from sequence data using VirMAP

Affiliations

Maximal viral information recovery from sequence data using VirMAP

Nadim J Ajami et al. Nat Commun. .

Abstract

Accurate classification of the human virome is critical to a full understanding of the role viruses play in health and disease. This implies the need for sensitive, specific, and practical pipelines that return precise outputs while still enabling case-specific post hoc analysis. Viral taxonomic characterization from metagenomic data suffers from high background noise and signal crosstalk that confounds current methods. Here we develop VirMAP that overcomes these limitations using techniques that merge nucleotide and protein information to taxonomically classify viral reconstructions independent of genome coverage or read overlap. We validate VirMAP using published data sets and viral mock communities containing RNA and DNA viruses and bacteriophages. VirMAP offers opportunities to enhance metagenomic studies seeking to define virome-host interactions, improve biosurveillance capabilities, and strengthen molecular epidemiology reporting.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
A schematic overview of VirMAP. Data processing with VirMAP is achieved through four main stages (shaded colors) divided into nine major steps (top left corner). A putative list of viral genomes and protein pseudo-scaffolds are constructed from clustered nucleotide and translated alignments to the Genbank viral and phage divisions (gbvrl and gbphage). Nucleotide and amino acid pseudo-scaffolds are “built” and merged into a single super-scaffold per genome. A merged de novo assembly is constructed and merged in, resulting in contigs that are then refined using an iterative rebuild process. The improved dual assembly is filtered against a comprehensive Genbank database and are taxonomically classified using a novel per-base contig scoring system
Fig. 2
Fig. 2
Viral Mock Community (VMC) calculated genome coverage depth and span from remapping source reads to VirMAP reconstructed genomes. The VMC consists of purified preparations of seven different viruses (a) human poliovirus type 1 [strain Mahoney], (b) echovirus E13 [strain Del Carmen], (c) coxsackievirus B4 [strain Tuscany], (d) human adenovirus (b, e) human adenovirus (c, f) murine gammaherpesvirus 4, and (g) rotavirus, combined at different concentrations in phosphate-buffered saline. Coverage depth and span are represented for each of the viruses in VMC per nucleotide position. For coverage span, a value of 1 represents a nucleotide position covered with respect to the source genome. VMC is available at BioProject ID PRJNA431646
Fig. 3
Fig. 3
VirMAP analysis of an external mock community. A mock virome control sample (SRR3458562) recently reported was processed with VirMAP. A total of 5,969,272 reads (32.96%) were classified as being of viral origin across 10 distinct viral lineages which included the nine viral constituents of the mock community. Additionally, one putative contaminant virus was identified: southern tomato virus
Fig. 4
Fig. 4
A comparison of VirMAP and the Standard Approach using the influenza virus dataset (BioProject ID PRJEB7888). The total length of reconstructed influenza virus segments was calculated at different levels of subsampling by adding the total number of base pairs found for across segments. An average N50 was calculated at each subsampling level by averaging the N50 values for all trials (20 at 100%, 200 at 10, 1, and 0.1). The percentage of positive trials correspond to the ratio of trials with >1 identifiable influenza contig over the total number of trials (20 at 100%, 200 at 10, 1, and 0.1%). Tukey plots, bar: statistical median, edges: low, 25%; high 75% quartiles, whiskers:1.5 × interquartile range, dots: outliers)

References

    1. Lin HH, Liao YC. drVM: a new tool for efficient genome assembly of known eukaryotic viruses from metagenomes. Gigascience. 2017;6:1–10. - PMC - PubMed
    1. Lin J, et al. Vipie: web pipeline for parallel characterization of viral populations from multiple NGS samples. BMC Genom. 2017;18:378. doi: 10.1186/s12864-017-3721-7. - DOI - PMC - PubMed
    1. Rampelli S, et al. ViromeScan: a new tool for metagenomic viral community profiling. BMC Genom. 2016;17:165. doi: 10.1186/s12864-016-2446-3. - DOI - PMC - PubMed
    1. Segata N, et al. Metagenomic microbial community profiling using unique clade-specific marker genes. Nat. Methods. 2012;9:811–814. doi: 10.1038/nmeth.2066. - DOI - PMC - PubMed
    1. Tithi SS, Aylward FO, Jensen RV, Zhang L. FastViromeExplorer: a pipeline for virus and phage identification and abundance profiling in metagenomics data. PeerJ. 2018;6:e4227. doi: 10.7717/peerj.4227. - DOI - PMC - PubMed

Publication types

LinkOut - more resources