. 2020 Feb 4:9:e51971.

doi: 10.7554/eLife.51971.

Discovery of several thousand highly diverse circular DNA viruses

Michael J Tisza¹, Diana V Pastrana¹, Nicole L Welch¹, Brittany Stewart¹, Alberto Peretti¹, Gabriel J Starrett¹, Yuk-Ying S Pang¹, Siddharth R Krishnamurthy², Patricia A Pesavento³, David H McDermott⁴, Philip M Murphy⁴, Jessica L Whited^{5

6

7}, Bess Miller^{5

6}, Jason Brenchley⁸, Stephan P Rosshart⁹, Barbara Rehermann⁹, John Doorbar¹⁰, Blake A Ta'ala¹¹, Olga Pletnikova¹², Juan C Troncoso¹², Susan M Resnick¹³, Ben Bolduc¹⁴, Matthew B Sullivan^{14

15}, Arvind Varsani^{16

17}, Anca M Segall¹⁸, Christopher B Buck¹

Affiliations

¹ Lab of Cellular Oncology, National Cancer Institute, National Institutes of Health, Bethesda, United States.
² Metaorganism Immunity Section, Laboratory of Immune System Biology, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, United States.
³ Department of Pathology, Microbiology, and Immunology, University of California, Davis, Davis, United States.
⁴ Molecular Signaling Section, Laboratory of Molecular Immunology, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, United States.
⁵ Department of Orthopedic Surgery, Harvard Medical School, The Harvard Stem Cell Institute, Brigham and Women's Hospital, Boston, United States.
⁶ Broad Institute of MIT and Harvard, Cambridge, United States.
⁷ Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, United States.
⁸ Barrier Immunity Section, Lab of Viral Diseases, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Cambridge, United States.
⁹ Immunology Section, Liver Diseases Branch, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, United States.
¹⁰ Department of Pathology, University of Cambridge, Cambridge, United Kingdom.
¹¹ Mililani Mauka Elementary, Mililani, United States.
¹² Department of Pathology (Neuropathology), Johns Hopkins University School of Medicine, Baltimore, United States.
¹³ Laboratory of Behavioral Neuroscience, National Institute on Aging, National Institutes of Health, Baltimore, United States.
¹⁴ Department of Microbiology, Ohio State University, Columbus, United States.
¹⁵ Civil Environmental and Geodetic Engineering, Ohio State University, Columbus, United States.
¹⁶ The Biodesign Center of Fundamental and Applied Microbiomics, School of Life Sciences, Center for Evolution and Medicine, Arizona State University, Tempe, United States.
¹⁷ Structural Biology Research Unit, Department of Clinical Laboratory Sciences, University of Cape Town, Rondebosch, South Africa.
¹⁸ Viral Information Institute and Department of Biology, San Diego State University, San Diego, United States.

PMID: 32014111
PMCID: PMC7000223
DOI: 10.7554/eLife.51971

Discovery of several thousand highly diverse circular DNA viruses

Michael J Tisza et al. Elife. 2020.

. 2020 Feb 4:9:e51971.

doi: 10.7554/eLife.51971.

Authors

Affiliations

¹ Lab of Cellular Oncology, National Cancer Institute, National Institutes of Health, Bethesda, United States.
² Metaorganism Immunity Section, Laboratory of Immune System Biology, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, United States.
³ Department of Pathology, Microbiology, and Immunology, University of California, Davis, Davis, United States.
⁴ Molecular Signaling Section, Laboratory of Molecular Immunology, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, United States.
⁵ Department of Orthopedic Surgery, Harvard Medical School, The Harvard Stem Cell Institute, Brigham and Women's Hospital, Boston, United States.
⁶ Broad Institute of MIT and Harvard, Cambridge, United States.
⁷ Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, United States.
⁸ Barrier Immunity Section, Lab of Viral Diseases, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Cambridge, United States.
⁹ Immunology Section, Liver Diseases Branch, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, United States.
¹⁰ Department of Pathology, University of Cambridge, Cambridge, United Kingdom.
¹¹ Mililani Mauka Elementary, Mililani, United States.
¹² Department of Pathology (Neuropathology), Johns Hopkins University School of Medicine, Baltimore, United States.
¹³ Laboratory of Behavioral Neuroscience, National Institute on Aging, National Institutes of Health, Baltimore, United States.
¹⁴ Department of Microbiology, Ohio State University, Columbus, United States.
¹⁵ Civil Environmental and Geodetic Engineering, Ohio State University, Columbus, United States.
¹⁶ The Biodesign Center of Fundamental and Applied Microbiomics, School of Life Sciences, Center for Evolution and Medicine, Arizona State University, Tempe, United States.
¹⁷ Structural Biology Research Unit, Department of Clinical Laboratory Sciences, University of Cape Town, Rondebosch, South Africa.
¹⁸ Viral Information Institute and Department of Biology, San Diego State University, San Diego, United States.

PMID: 32014111
PMCID: PMC7000223
DOI: 10.7554/eLife.51971

Abstract

Although millions of distinct virus species likely exist, only approximately 9000 are catalogued in GenBank's RefSeq database. We selectively enriched for the genomes of circular DNA viruses in over 70 animal samples, ranging from nematodes to human tissue specimens. A bioinformatics pipeline, Cenote-Taker, was developed to automatically annotate over 2500 complete genomes in a GenBank-compliant format. The new genomes belong to dozens of established and emerging viral families. Some appear to be the result of previously undescribed recombination events between ssDNA and ssRNA viruses. In addition, hundreds of circular DNA elements that do not encode any discernable similarities to previously characterized sequences were identified. To characterize these 'dark matter' sequences, we used an artificial neural network to identify candidate viral capsid proteins, several of which formed virus-like particles when expressed in culture. These data further the understanding of viral sequence diversity and allow for high throughput documentation of the virosphere.

Keywords: evolutionary biology; infectious disease; metagenomics; microbiology; microbiome; viral evolution; virus.

PubMed Disclaimer

Conflict of interest statement

MT, DP, NW, BS, AP, GS, YP, SK, PP, DM, PM, JW, BM, JB, SR, BR, JD, BT, OP, JT, SR, BB, MS, AV, AS, CB No competing interests declared

Figures

**Figure 1.. Novel viruses associated with animal samples.**
Gross characterization of viruses discovered in this project compared to NCBI RefSeq virus database entries. (A) Pie chart representing the number of viral genomes in broad categories. (B) Bar graph showing the number of new representatives of known viral families or unclassified groups. (C) Heatmap reporting number of genomes found associated with each animal species. Number of samples per species in brackets. Note that genomes in this study were assigned taxonomy based on at least one region with a BLASTX hit with an E value <1 × 10⁻⁵, suggesting commonality with a known viral family. Some genomes may ultimately be characterized as being basal to the assigned family.

**Figure 1—figure supplement 2.. Size distribution of circular DNA sequences from this study.**
Length, in nucleotides, of circular DNA sequences representing putative viral genomes from this study.

**Figure 1—figure supplement 3.. Mapping reads to complete viral genome references.**
Quality-trimmed reads were aligned with Bowtie2 to reference genomes from RefSeq and this study. Genomes were masked for low-complexity regions.

**Figure 2.. Sequence similarity network analysis of CRESS virus capsid proteins.**
EFI-EST was used to conduct pairwise alignments of amino acid sequences from this study and GenBank with predicted structural similarity to CRESS virus capsid proteins. The E value cutoff for the analysis was 10⁻⁵. (A) Cluster consisting of proteins with predicted structural similarity to geminivirus-like capsids and/or STNV-like capsids. The phylogenetic tree was made from all sequences in this cluster. (B) A cluster consisting of sequences with predicted structural similarity to Circovirus capsid proteins. The phylogenetic tree was made from all sequences in this cluster. (C) Assorted clusters and singletons from unclassified CRESS virus proteins that were modeled to be capsids. (D) Nanovirus capsids. (E) Gyrovirus capsids.

**Figure 2—figure supplement 1.. Network Analysis of additional viral hallmark genes.**
Depiction of additional viral hallmark genes from this study and GenBank as sequence similarity networks. E value cutoff = 10⁻⁵. See Figure 2 and Materials and methods.

**Figure 2—figure supplement 2.. Phylogenetic trees of viral hallmark genes.**
Sequences were aligned with PROMALS3D using structure guidance when possible. Trees were drawn using IQ-Tree with automatic determination of substitution model. See Materials and methods. Branches are labeled with bootstrap percent support after 1000 ultrafast bootstrapping events. (A) *Microviridae* major capsid protein. (B) *Inoviridae* zonular occludens toxin. (C) CRESS virus Rep. (D) *Anelloviridae* ORF1 (E) *Microviridae*/*Inoviridae* Replication-associated protein I. (F) *Microviridae*/*Inoviridae* Replication-associated protein II. (G) *Microviridae*/*Inoviridae* Replication-associated protein III.

**Figure 3.. Network analysis of CRESS virus Rep proteins.**
EFI-EST was used to conduct pairwise alignments of amino acid sequences from this study and. GenBank that were structurally modeled to be a rolling-circle replicase (Rep). The analysis used an E value cutoff of 10⁻⁶⁰ to divide the data into family-level clusters.

**Figure 4.. RNA virus capsid-like proteins.**
Sequence similarity network generated with EFI-EST (E value cutoff of 10⁻⁵) showing capsid protein sequences of select ssRNA viruses (*Nodaviridae*, *Tombusviridae*, tombus-like viruses) and ssDNA viruses (*Bacilladnaviridae* and crucivirus) together with protein sequences from DNA virus genomes observed in the present study with predicted structural similarity to an RNA virus capsid protein domain (PDB: 2IZW). Predicted capsid proteins for CRESS virus ctca5 and CRESS virus ctgh4 have no detectable similarity to any known DNA virus sequences. On the left, a phylogenetic tree representing the large cluster is displayed. Collapsed branches consist of *Tombusviridae*, tombus-like viruses, and *Nodaviridae* capsid genes.

**Figure 4—figure supplement 1.. Genome maps of large CRESS virus genomes.**
Predicted CRESS Rep-like genes are displayed in orange, virion structural genes shown in green, other identifiable viral genes shown in pink, other genes in gray. GenBank accession numbers are displayed above the virus name.

**Figure 4—figure supplement 2.. Validation of proteins with predicted similarity to RNA virus capsid proteins.**
(A) First order neighbors for Crucian-associated CRESS virus ctgh4 capsid protein were extracted from the network shown in Figure 5 and aligned using Muscle. (B) The same approach was applied to CRESS virus ctbd466 capsid protein. (C) A visualization (Integrative Genomics Viewer) of a read alignment to CRESS virus isolate ctca5. The visualization shows no evidence of artifactual chimerization in the contig assembly process.

**Figure 5.. Dark matter analysis.**
(A) Sequence similarity network analysis for genes from dark matter circular sequences (minimum cluster size = 4). Clusters are colored based on assigned dark matter genome group (DMGG). Structural predictions from HHpred are indicated (>85% probability). *Rep* = rolling circle replicases typical of CRESS viruses or ssDNA plasmids. *Capsid* = single jellyroll capsid protein. *Attachment* = cell attachment proteins typical of inoviruses. *DNA-Binding* = DNA binding domain. *PLA2* = phospholipase A2. *FtsL* = FtsL like cell division protein. Clusters that contain a representative protein that was successfully expressed as a virus-like particle are outlined by a dashed rectangle (See Figure 6). (B) Maps of three examples of DMGG1 with DMPCs labeled (linearized for display). (C) DMGG1 iVireons 'structure' score summary by protein cluster. Scores range from −1 (unlikely to be a virion structural protein) to 1 (likely to be a virion structural protein). Additional iVireons score summaries can be found in Figure 5—figure supplement 2.

**Figure 5—figure supplement 1.. Sample characterization by iterative BLAST Searches.**
Contigs of over 1000 nts from each sample were subject to iterative BLAST searches. First, BLASTN was performed against the RefSeq database. Contigs without hits were then queried by BLASTX against all of GenBank ‘nr’ database. Contigs without hits were then queried by BLASTX against a database of proteins from genomes reported in this study. The proportion of total reads mapping to each contig was calculated and used for this plot. Individual inspection of contigs shows that most hits in the ‘Translated AA alignment to GenBank’ nr’ ‘Bacteria’’ were likely plasmid or prophage proteins. The proportions of hits in each category are sensitive to stringency settings and to which databases are chosen for the analysis. The key aims of the figure are to display the proportion of reads the current survey rendered classifiable and the fraction of remaining dark matter reads in various samples.

**Figure 5—figure supplement 2.. iVireons scores of DMGGs with candidate viral structural gene(s).**
Box-and-whisker plots of iVireons ‘Structural' scores for individual DMPCs (numbers on x-axes) grouped by DMGG. Scores (y-axes) range from −1 (unlikely to be a virion structural protein) to 1 (likely to be a virion structural protein). DMGG2 and DMGG3 have been combined due to inferred chimerism.

**Figure 6.. Expression of putative capsid proteins Images taken by negative stain electron microscopy.**
Genome maps are linearized for display purposes. Expressed genes are colored green. iVireons scores are listed in parentheses. (**A-C**) Images represent virus-like particles from iVireons-predicted viral structural genes. (D) Merkel cell polyomavirus small T antigen (a viral non-structural protein) is shown as a negative control.

See this image and copyright information in PMC

References

1. Agranovsky AA, Lesemann DE, Maiss E, Hull R, Atabekov JG. "Rattlesnake" structure of a filamentous plant RNA virus built of two capsid proteins. PNAS. 1995;92:2470–2473. doi: 10.1073/pnas.92.7.2470. - DOI - PMC - PubMed
1. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. Journal of Molecular Biology. 1990;215:403–410. doi: 10.1016/S0022-2836(05)80360-2. - DOI - PubMed
1. Asplund M, Kjartansdóttir KR, Mollerup S, Vinner L, Fridholm H, Herrera JAR, Friis-Nielsen J, Hansen TA, Jensen RH, Nielsen IB, Richter SR, Rey-Iglesia A, Matey-Hernandez ML, Alquezar-Planas DE, Olsen PVS, Sicheritz-Pontén T, Willerslev E, Lund O, Brunak S, Mourier T, Nielsen LP, Izarzugaza JMG, Hansen AJ. Contaminating viral sequences in high-throughput sequencing viromics: a linkage study of 700 sequencing libraries. Clinical Microbiology and Infection. 2019;25:1277–1285. doi: 10.1016/j.cmi.2019.04.028. - DOI - PubMed
1. Bedell MA, Hudson JB, Golub TR, Turyk ME, Hosken M, Wilbanks GD, Laimins LA. Amplification of human papillomavirus genomes in vitro is dependent on epithelial differentiation. Journal of Virology. 1991;65:2254–2260. doi: 10.1128/JVI.65.5.2254-2260.1991. - DOI - PMC - PubMed
1. Bin Jang H, Bolduc B, Zablocki O, Kuhn JH, Roux S, Adriaenssens EM, Brister JR, Kropinski AM, Krupovic M, Lavigne R, Turner D, Sullivan MB. Taxonomic assignment of uncultivated prokaryotic virus genomes is enabled by gene-sharing networks. Nature Biotechnology. 2019;37:632–639. doi: 10.1038/s41587-019-0100-8. - DOI - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions
Actions

Grants and funding

P30 AG066507/AG/NIA NIH HHS/United States

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Discovery of several thousand highly diverse circular DNA viruses

Affiliations

Discovery of several thousand highly diverse circular DNA viruses

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources