Maximal viral information recovery from sequence data using VirMAP

Nadim J Ajami^{1

2}, Matthew C Wong^{3

4}, Matthew C Ross^{3

4}, Richard E Lloyd⁴, Joseph F Petrosino^{3

4}

Affiliations

¹ Alkek Center for Metagenomics and Microbiome Research, Baylor College of Medicine, Houston, TX, 77030, USA. nadimajami@gmail.com.
² Department of Molecular Virology and Microbiology, Baylor College of Medicine, Houston, TX, 77030, USA. nadimajami@gmail.com.
³ Alkek Center for Metagenomics and Microbiome Research, Baylor College of Medicine, Houston, TX, 77030, USA.
⁴ Department of Molecular Virology and Microbiology, Baylor College of Medicine, Houston, TX, 77030, USA.

PMID: 30097567
PMCID: PMC6086868
DOI: 10.1038/s41467-018-05658-8

Maximal viral information recovery from sequence data using VirMAP

Nadim J Ajami et al. Nat Commun. 2018.

. 2018 Aug 10;9(1):3205.

doi: 10.1038/s41467-018-05658-8.

Authors

Nadim J Ajami^{1

2}, Matthew C Wong^{3

4}, Matthew C Ross^{3

4}, Richard E Lloyd⁴, Joseph F Petrosino^{3

4}

Affiliations

¹ Alkek Center for Metagenomics and Microbiome Research, Baylor College of Medicine, Houston, TX, 77030, USA. nadimajami@gmail.com.
² Department of Molecular Virology and Microbiology, Baylor College of Medicine, Houston, TX, 77030, USA. nadimajami@gmail.com.
³ Alkek Center for Metagenomics and Microbiome Research, Baylor College of Medicine, Houston, TX, 77030, USA.
⁴ Department of Molecular Virology and Microbiology, Baylor College of Medicine, Houston, TX, 77030, USA.

PMID: 30097567
PMCID: PMC6086868
DOI: 10.1038/s41467-018-05658-8

Abstract

Accurate classification of the human virome is critical to a full understanding of the role viruses play in health and disease. This implies the need for sensitive, specific, and practical pipelines that return precise outputs while still enabling case-specific post hoc analysis. Viral taxonomic characterization from metagenomic data suffers from high background noise and signal crosstalk that confounds current methods. Here we develop VirMAP that overcomes these limitations using techniques that merge nucleotide and protein information to taxonomically classify viral reconstructions independent of genome coverage or read overlap. We validate VirMAP using published data sets and viral mock communities containing RNA and DNA viruses and bacteriophages. VirMAP offers opportunities to enhance metagenomic studies seeking to define virome-host interactions, improve biosurveillance capabilities, and strengthen molecular epidemiology reporting.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

**Fig. 1**
A schematic overview of VirMAP. Data processing with VirMAP is achieved through four main stages (shaded colors) divided into nine major steps (top left corner). A putative list of viral genomes and protein pseudo-scaffolds are constructed from *clustered* nucleotide and translated alignments to the Genbank viral and phage divisions (gbvrl and gbphage). Nucleotide and amino acid pseudo-scaffolds are “built” and merged into a single super-scaffold per genome. A merged de novo assembly is constructed and merged in, resulting in contigs that are then *refined* using an iterative rebuild process. The improved dual assembly is filtered against a comprehensive Genbank database and are taxonomically *classified* using a novel per-base contig scoring system

**Fig. 2**
Viral Mock Community (VMC) calculated genome coverage depth and span from remapping source reads to VirMAP reconstructed genomes. The VMC consists of purified preparations of seven different viruses (a) human poliovirus type 1 [strain Mahoney], (b) echovirus E13 [strain Del Carmen], (c) coxsackievirus B4 [strain Tuscany], (d) human adenovirus (b, e) human adenovirus (c, f) murine gammaherpesvirus 4, and (g) rotavirus, combined at different concentrations in phosphate-buffered saline. Coverage depth and span are represented for each of the viruses in VMC per nucleotide position. For coverage span, a value of 1 represents a nucleotide position covered with respect to the source genome. VMC is available at BioProject ID PRJNA431646

**Fig. 3**
VirMAP analysis of an external mock community. A mock virome control sample (SRR3458562) recently reported was processed with VirMAP. A total of 5,969,272 reads (32.96%) were classified as being of viral origin across 10 distinct viral lineages which included the nine viral constituents of the mock community. Additionally, one putative contaminant virus was identified: southern tomato virus

**Fig. 4**
A comparison of VirMAP and the Standard Approach using the influenza virus dataset (BioProject ID PRJEB7888). The total length of reconstructed influenza virus segments was calculated at different levels of subsampling by adding the total number of base pairs found for across segments. An average N50 was calculated at each subsampling level by averaging the N50 values for all trials (20 at 100%, 200 at 10, 1, and 0.1). The percentage of positive trials correspond to the ratio of trials with >1 identifiable influenza contig over the total number of trials (20 at 100%, 200 at 10, 1, and 0.1%). Tukey plots, bar: statistical median, edges: low, 25%; high 75% quartiles, whiskers:1.5 × interquartile range, dots: outliers)

See this image and copyright information in PMC

References

1. Lin HH, Liao YC. drVM: a new tool for efficient genome assembly of known eukaryotic viruses from metagenomes. Gigascience. 2017;6:1–10. - PMC - PubMed
1. Lin J, et al. Vipie: web pipeline for parallel characterization of viral populations from multiple NGS samples. BMC Genom. 2017;18:378. doi: 10.1186/s12864-017-3721-7. - DOI - PMC - PubMed
1. Rampelli S, et al. ViromeScan: a new tool for metagenomic viral community profiling. BMC Genom. 2016;17:165. doi: 10.1186/s12864-016-2446-3. - DOI - PMC - PubMed
1. Segata N, et al. Metagenomic microbial community profiling using unique clade-specific marker genes. Nat. Methods. 2012;9:811–814. doi: 10.1038/nmeth.2066. - DOI - PMC - PubMed
1. Tithi SS, Aylward FO, Jensen RV, Zhang L. FastViromeExplorer: a pipeline for virus and phage identification and abundance profiling in metagenomics data. PeerJ. 2018;6:e4227. doi: 10.7717/peerj.4227. - DOI - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Maximal viral information recovery from sequence data using VirMAP

Affiliations

Maximal viral information recovery from sequence data using VirMAP

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Other Literature Sources