. 2021 Mar 30:9:e11088.

doi: 10.7717/peerj.11088. eCollection 2021.

VirION2: a short- and long-read sequencing and informatics workflow to study the genomic diversity of viruses in nature

Olivier Zablocki^{1

2}, Michelle Michelsen³, Marie Burris¹, Natalie Solonenko¹, Joanna Warwick-Dugdale^{3

4}, Romik Ghosh¹, Jennifer Pett-Ridge⁵, Matthew B Sullivan^{1

2

6}, Ben Temperton³

Affiliations

¹ Department of Microbiology, The Ohio State University, Columbus, OH, United States of America.
² Center of Microbiome Science, The Ohio State University, Columbus, OH, United States of America.
³ School of Biosciences, University of Exeter, Exeter, Devon, United Kingdom.
⁴ Plymouth Marine Laboratory, Plymouth, Devon, United Kingdom.
⁵ Physical and Life Sciences Directorate, Lawrence Livermore National Laboratory, Livermore, CA, United States of America.
⁶ Department of Civil, Environmental and Geodetic Engineering, The Ohio State University, Columbus, OH, United States of America.

PMID: 33850654
PMCID: PMC8018248
DOI: 10.7717/peerj.11088

VirION2: a short- and long-read sequencing and informatics workflow to study the genomic diversity of viruses in nature

Olivier Zablocki et al. PeerJ. 2021.

. 2021 Mar 30:9:e11088.

doi: 10.7717/peerj.11088. eCollection 2021.

Authors

Olivier Zablocki^{1

2}, Michelle Michelsen³, Marie Burris¹, Natalie Solonenko¹, Joanna Warwick-Dugdale^{3

4}, Romik Ghosh¹, Jennifer Pett-Ridge⁵, Matthew B Sullivan^{1

2

6}, Ben Temperton³

Affiliations

¹ Department of Microbiology, The Ohio State University, Columbus, OH, United States of America.
² Center of Microbiome Science, The Ohio State University, Columbus, OH, United States of America.
³ School of Biosciences, University of Exeter, Exeter, Devon, United Kingdom.
⁴ Plymouth Marine Laboratory, Plymouth, Devon, United Kingdom.
⁵ Physical and Life Sciences Directorate, Lawrence Livermore National Laboratory, Livermore, CA, United States of America.
⁶ Department of Civil, Environmental and Geodetic Engineering, The Ohio State University, Columbus, OH, United States of America.

PMID: 33850654
PMCID: PMC8018248
DOI: 10.7717/peerj.11088

Abstract

Microbes play fundamental roles in shaping natural ecosystem properties and functions, but do so under constraints imposed by their viral predators. However, studying viruses in nature can be challenging due to low biomass and the lack of universal gene markers. Though metagenomic short-read sequencing has greatly improved our virus ecology toolkit-and revealed many critical ecosystem roles for viruses-microdiverse populations and fine-scale genomic traits are missed. Some of these microdiverse populations are abundant and the missed regions may be of interest for identifying selection pressures that underpin evolutionary constraints associated with hosts and environments. Though long-read sequencing promises complete virus genomes on single reads, it currently suffers from high DNA requirements and sequencing errors that limit accurate gene prediction. Here we introduce VirION2, an integrated short- and long-read metagenomic wet-lab and informatics pipeline that updates our previous method (VirION) to further enhance the utility of long-read viral metagenomics. Using a viral mock community, we first optimized laboratory protocols (polymerase choice, DNA shearing size, PCR cycling) to enable 76% longer reads (now median length of 6,965 bp) from 100-fold less input DNA (now 1 nanogram). Using a virome from a natural seawater sample, we compared viromes generated with VirION2 against other library preparation options (unamplified, original VirION, and short-read), and optimized downstream informatics for improved long-read error correction and assembly. VirION2 assemblies combined with short-read based data ('enhanced' viromes), provided significant improvements over VirION libraries in the recovery of longer and more complete viral genomes, and our optimized error-correction strategy using long- and short-read data achieved 99.97% accuracy. In the seawater virome, VirION2 assemblies captured 5,161 viral populations (including all of the virus populations observed in the other assemblies), 30% of which were uniquely assembled through inclusion of long-reads, and 22% of the top 10% most abundant virus populations derived from assembly of long-reads. Viral populations unique to VirION2 assemblies had significantly higher microdiversity means, which may explain why short-read virome approaches failed to capture them. These findings suggest the VirION2 sample prep and workflow can help researchers better investigate the virosphere, even from challenging low-biomass samples. Our new protocols are available to the research community on protocols.io as a 'living document' to facilitate dissemination of updates to keep pace with the rapid evolution of long-read sequencing technology.

Keywords: Long-reads; Metagenome; Nanopore sequencing; Phage; Viral metagenomics; Virome; Virus.

PubMed Disclaimer

Conflict of interest statement

The authors declare there are no competing interests.

Figures

**Figure 1. Overview of wet lab optimization experiments and informatic benchmarking.**
(A) Laboratory optimization (‘Experiment 1’) in which a mock community of three phages was used to conduct three experiments aimed at producing longer reads from less input DNA. (B) Informatics benchmarking (‘Experiment 2’) in which a seawater virome was sequenced with short-reads (Illumina) and long-read sequencing (Oxford Nanopore). Three distinct long-read libraries were generated, error-corrected and assembled, and were compared to short-read assemblies to assess accuracy and assembly performance.

**Figure 2. Laboratory optimization yield longer reads from less DNA.**
(A) Boxplots showing the median and quartiles of the read length distribution between four DNA polymerases. (B) Boxplots showing the median and quartiles of the read length distribution between DNA shearing size treatments and one low-input DNA (1ng) variant of the 15kbp shearing treatment. (C) Boxplots showing the median and quartiles of the read length distribution of three long-read library types, either unamplified or amplified (VirION and VirION2). (D) Boxplots showing the median and quartiles of the read length distribution between four thermocycling treatments (here, number of cycles). Asterisks represent a significant difference (p < 0.0001) between pairs of replicate treatments where applicable.

**Figure 3. Error-correction profiles between library methods and assembly strategies using the WEC sample.**
(A) On the x-axis, mismatches and insertion/deletion (‘indels’) events according to assembly strategy (full OLC, full Flye, and Hybrid) are grouped separately, and divided into three facets, one for each long-read library method (unamplified, VirION, VirION2). The number of errors (y-axis) is scaled to the binary logarithm (log2) for scale fitting purposes. (B) Boxplot depicting the protein size distribution (in amino acids, denoted as ‘a.a’; y-axis) derived from each library method (x-axis), each of which is sub-grouped per assembly strategy. In both A and B panels, there were no results from Flye assemblies for the raw datasets, as these could not be produced.

**Figure 4. Comparison of virus genome properties between short-read and ‘long-read-enhanced’ viromes.**
(A) Workflow to produce ‘enhanced viromes’, in which Spades, hybrid and long-read (OLC) viruses are combined to maximize the recovery of virus signals. (B) Cumulative Distribution Function (CDF) plot depicting the frequency (y-axis) of virus genomes according to genome length (measured in kilo basepairs (kbp), x-axis) between three assembly strategies. (C) Cumulative Distribution Function (CDF) plot depicting the frequency (y-axis) of virus genomes according to genome ‘completeness’ (measured in %, x-axis) between three assembly strategies. (D) Cumulative Distribution Function (CDF) plot depicting the frequency (y-axis) of virus genomes according to genome microdiversity per genome (measured as π , x-axis) between three assembly strategies.

**Figure 5. Additional insights are gained through a VirION2-enhanced assembly strategy.**
(A) Rank abundance curve depicting the seawater virus community, colored according to whether a virus population (individual bars) was detected uniquely (turquoise) or in multiple (pastel red) assembly types. The top 10% most abundant viral populations are highlighted between dashed lines, where they are divided per assembly origin in the pie chart. (B) Boxplots depicting the level of microdiversity between shared and unique viral populations within each constituent assembly present in the ‘enhanced’ dataset.

See this image and copyright information in PMC

Cited by

Long-read powered viral metagenomics in the oligotrophic Sargasso Sea.
Warwick-Dugdale J, Tian F, Michelsen ML, Cronin DR, Moore K, Farbos A, Chittick L, Bell A, Zayed AA, Buchholz HH, Bolanos LM, Parsons RJ, Allen MJ, Sullivan MB, Temperton B. Warwick-Dugdale J, et al. Nat Commun. 2024 May 14;15(1):4089. doi: 10.1038/s41467-024-48300-6. Nat Commun. 2024. PMID: 38744831 Free PMC article.
Evolutionary Divergence of Marinobacter Strains in Cryopeg Brines as Revealed by Pangenomics.
Cooper ZS, Rapp JZ, Shoemaker AMD, Anderson RE, Zhong ZP, Deming JW. Cooper ZS, et al. Front Microbiol. 2022 Jun 6;13:879116. doi: 10.3389/fmicb.2022.879116. eCollection 2022. Front Microbiol. 2022. PMID: 35733954 Free PMC article.
Databases, Knowledgebases, and Software Tools for Virus Informatics.
Lin Y, Qian Y, Qi X, Shen B. Lin Y, et al. Adv Exp Med Biol. 2022;1368:1-19. doi: 10.1007/978-981-16-8969-7_1. Adv Exp Med Biol. 2022. PMID: 35594018
Lower viral evolutionary pressure under stable versus fluctuating conditions in subzero Arctic brines.
Zhong ZP, Vik D, Rapp JZ, Zablocki O, Maughan H, Temperton B, Deming JW, Sullivan MB. Zhong ZP, et al. Microbiome. 2023 Aug 7;11(1):174. doi: 10.1186/s40168-023-01619-6. Microbiome. 2023. PMID: 37550784 Free PMC article.
Simple, reference-independent assessment to empirically guide correction and polishing of hybrid microbial community metagenomic assembly.
Smith GJ, van Alen TA, van Kessel MAHJ, Lücker S. Smith GJ, et al. PeerJ. 2024 Nov 8;12:e18132. doi: 10.7717/peerj.18132. eCollection 2024. PeerJ. 2024. PMID: 39529629 Free PMC article.

See all "Cited by" articles

References

1. Al-Shayeb B, Sachdeva R, Chen L-X, Ward F, Munk P, Devoto A, Castelle CJ, Olm MR, Bouma-Gregson K, Amano Y, He C, Méheust R, Brooks B, Thomas A, Lavy A, Matheus-Carnevali P, Sun C, Goltsman DSA, Borton MA, Sharrar A, Jaffe AL, Nelson TC, Kantor R, Keren R, Lane KR, Farag IF, Lei S, Finstad K, Amundson R, Anantharaman K, Zhou J, Probst AJ, Power ME, Tringe SG, Li W-J, Wrighton K, Harrison S, Morowitz M, Relman DA, Doudna JA, Lehours A-C, Warren L, Cate JHD, Santini JM, Banfield JF. Clades of huge phages from across Earth’s ecosystems. Nature. 2020;578:425–431. doi: 10.1038/s41586-020-2007-4. - DOI - PMC - PubMed
1. Amarasinghe SL, Su S, Dong X, Zappia L, Ritchie ME, Gouil Q. Opportunities and challenges in long-read sequencing data analysis. Genome Biology. 2020;21:30. doi: 10.1186/s13059-020-1935-5. - DOI - PMC - PubMed
1. Antipov D, Korobeynikov A, McLean JS, Pevzner PA. HybridSPAdes: an algorithm for hybrid assembly of short and long reads. Bioinformatics. 2016;32:1009–1015. doi: 10.1093/bioinformatics/btv688. - DOI - PMC - PubMed
1. Beaulaurier J, Luo E, Eppley JM, Den UylP, Dai X, Burger A, Turner DJ, Pendelton M, Juul S, Harrington E, DeLong EF. Assembly-free single-molecule sequencing recovers complete virus genomes from natural microbial communities. Genome Research. 2020;30:437–446. doi: 10.1101/gr.251686.119. - DOI - PMC - PubMed
1. Bonilla N, Rojas MI, Netto Flores Cruz G, Hung S-H, Rohwer F, Barr JJ. Phage on tap–a quick and efficient protocol for the preparation of bacteriophage laboratory stocks. PeerJ. 2016;4:e2261. doi: 10.7717/peerj.2261. - DOI - PMC - PubMed

Grants and funding

WT_/Wellcome Trust/United Kingdom

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

VirION2: a short- and long-read sequencing and informatics workflow to study the genomic diversity of viruses in nature

Affiliations

VirION2: a short- and long-read sequencing and informatics workflow to study the genomic diversity of viruses in nature

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Miscellaneous

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Related information

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Miscellaneous