Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Jan 20;13(2):150.
doi: 10.3390/v13020150.

ViralRecall-A Flexible Command-Line Tool for the Detection of Giant Virus Signatures in 'Omic Data

Affiliations

ViralRecall-A Flexible Command-Line Tool for the Detection of Giant Virus Signatures in 'Omic Data

Frank O Aylward et al. Viruses. .

Abstract

Giant viruses are widespread in the biosphere and play important roles in biogeochemical cycling and host genome evolution. Also known as nucleo-cytoplasmic large DNA viruses (NCLDVs), these eukaryotic viruses harbor the largest and most complex viral genomes known. Studies have shown that NCLDVs are frequently abundant in metagenomic datasets, and that sequences derived from these viruses can also be found endogenized in diverse eukaryotic genomes. The accurate detection of sequences derived from NCLDVs is therefore of great importance, but this task is challenging owing to both the high level of sequence divergence between NCLDV families and the extraordinarily high diversity of genes encoded in their genomes, including some encoding for metabolic or translation-related functions that are typically found only in cellular lineages. Here, we present ViralRecall, a bioinformatic tool for the identification of NCLDV signatures in 'omic data. This tool leverages a library of giant virus orthologous groups (GVOGs) to identify sequences that bear signatures of NCLDVs. We demonstrate that this tool can effectively identify NCLDV sequences with high sensitivity and specificity. Moreover, we show that it can be useful both for removing contaminating sequences in metagenome-assembled viral genomes as well as the identification of eukaryotic genomic loci that derived from NCLDV. ViralRecall is written in Python 3.5 and is freely available on GitHub: https://github.com/faylward/viralrecall.

Keywords: endogenous viral elements; giant viruses; metagenomics; nucleo-cytoplasmic large DNA viruses; viral diversity.

PubMed Disclaimer

Conflict of interest statement

The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Figures

Figure 1
Figure 1
Diagram of the ViralRecall workflow. Abbreviations: GVOGs, giant virus orthologous groups.
Figure 2
Figure 2
(A) ViralRecall scores and lengths for 879 non-nucleocytoplasmic large DNA viruses (NCLDV) dsDNA viruses. (B) ViralRecall scores and lengths for 38,886 giant virus contigs from 1548 reference and metagenome-assembled giant virus genomes. Contigs with scores <0 are colored red, while those with scores ≥0 are colored blue.
Figure 3
Figure 3
ViralRecall plots of diverse dsDNA viruses. NCLDV genomes are shown on the left, while the right panels show other dsDNA viruses, or highly divergent NCLDV in the case of Yaravirus. The jumbo bacteriophages LP_PHAGE_COMPLETE_CIR-CU-CL_32_18, FFC_PHAGE_43_1208, and M01_PHAGE_56_67 were chosen because they have the longest length, highest score, and lowest score, respectively, among the 336 jumbo phages tested.
Figure 4
Figure 4
(A) ViralRecall plots for the giant virus MAGs ERX556094.26 and GVMAG-S-1064190.84 demonstrating that both contain non-NCLDV contamination. For ERX556096-26, nine contaminant contigs were detected, while two were found in GVMAG-S-1064190.84. (B) Dot plot of the mean ViralRecall scores for all contigs in ERX556094.26 and GVMAG-S-1064190.84. Contigs with ViralRecall scores < 0 are colored red, and dot sizes are proportional to contig size.
Figure 5
Figure 5
ViralRecall plots for endogenized viral regions identified in Hydra vulgaris, Bigelowiella natans, and Asterochloris glomerata.

References

    1. Koonin E.V., Dolja V.V., Krupovic M., Varsani A., Wolf Y.I., Yutin N., Zerbini F.M., Kuhn J.H. Global Organization and Proposed Megataxonomy of the Virus World. Microbiol. Mol. Biol. Rev. 2020;84 doi: 10.1128/MMBR.00061-19. - DOI - PMC - PubMed
    1. Brandes N., Linial M. Giant Viruses—Big Surprises. Viruses. 2019;11:404. doi: 10.3390/v11050404. - DOI - PMC - PubMed
    1. Raoult D., Forterre P. Redefining viruses: Lessons from Mimivirus. Nat. Rev. Microbiol. 2008;6:315–319. doi: 10.1038/nrmicro1858. - DOI - PubMed
    1. Sun T.-W., Yang C.-L., Kao T.-T., Wang T.-H., Lai M.-W., Ku C. Host Range and Coding Potential of Eukaryotic Giant Viruses. Viruses. 2020;12 doi: 10.3390/v12111337. - DOI - PMC - PubMed
    1. Abergel C., Legendre M., Claverie J.-M. The rapidly expanding universe of giant viruses: Mimivirus, Pandoravirus, Pithovirus and Mollivirus. FEMS Microbiol. Rev. 2015;39:779–796. doi: 10.1093/femsre/fuv037. - DOI - PubMed

Publication types

LinkOut - more resources