Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Apr 22;20(4):e1012163.
doi: 10.1371/journal.ppat.1012163. eCollection 2024 Apr.

Deep mining of the Sequence Read Archive reveals major genetic innovations in coronaviruses and other nidoviruses of aquatic vertebrates

Affiliations

Deep mining of the Sequence Read Archive reveals major genetic innovations in coronaviruses and other nidoviruses of aquatic vertebrates

Chris Lauber et al. PLoS Pathog. .

Abstract

Virus discovery by genomics and metagenomics empowered studies of viromes, facilitated characterization of pathogen epidemiology, and redefined our understanding of the natural genetic diversity of viruses with profound functional and structural implications. Here we employed a data-driven virus discovery approach that directly queries unprocessed sequencing data in a highly parallelized way and involves a targeted viral genome assembly strategy in a wide range of sequence similarity. By screening more than 269,000 datasets of numerous authors from the Sequence Read Archive and using two metrics that quantitatively assess assembly quality, we discovered 40 nidoviruses from six virus families whose members infect vertebrate hosts. They form 13 and 32 putative viral subfamilies and genera, respectively, and include 11 coronaviruses with bisegmented genomes from fishes and amphibians, a giant 36.1 kilobase coronavirus genome with a duplicated spike glycoprotein (S) gene, 11 tobaniviruses and 17 additional corona-, arteri-, cremega-, nanhypo- and nangoshaviruses. Genome segmentation emerged in a single evolutionary event in the monophyletic lineage encompassing the subfamily Pitovirinae. We recovered the bisegmented genome sequences of two coronaviruses from RNA samples of 69 infected fishes and validated the presence of poly(A) tails at both segments using 3'RACE PCR and subsequent Sanger sequencing. We report a genetic linkage between accessory and structural proteins whose phylogenetic relationships and evolutionary distances are incongruent with the phylogeny of replicase proteins. We rationalize these observations in a model of inter-family S recombination involving at least five ancestral corona- and tobaniviruses of aquatic hosts. In support of this model, we describe an individual fish co-infected with members from the families Coronaviridae and Tobaniviridae. Our results expand the scale of the known extraordinary evolutionary plasticity in nidoviral genome architecture and call for revisiting fundamentals of genome expression, virus particle biology, host range and ecology of vertebrate nidoviruses.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Assembly quality assessment.
(A) Toy example visualizing how meas (left) and mico (right) assembly quality metrics are calculated. Alignment scores used for meas were calculated using Bowtie2 and have a maximum value of zero corresponding to reads aligning full-length without mismatches. (B) Distribution of meas and mico values obtained for the nidoviral sequences discovered and assembled in this study (green) and for 2350 reference RNA virus sequences (gray) [37,57]. Both x-axes are in log10 scale.
Fig 2
Fig 2. Virus discovery in the SRA.
All numbers in the different panels correspond to counts of SRA runs. (Left) Results of a profile hidden Markov model (pHMM) based sequence homology search in the raw read data (Virushunter). Significant hits (at least one sequencing read with E-value < 1x10-4) against one of three nidovirus pHMMs (see Materials and Methods for details) are shown if the corresponding sequences did not give better hits against other RNA viruses or against host sequences. Hits are grouped by order of the putative vertebrate host according to the annotation of the sequencing projects. Note that a detected sequence may not necessarily be from a member of the order Nidovirales but might also be from a virus of a related taxon for which no reference sequence was available by the time of analysis. (Right) Remaining hits after targeted viral genome assembly (Virusgatherer). Only contigs of at least 1000 nt in length were considered, and those with significant hits (covering at least 500 nt with E-value < 1x10-4) against nidoviruses were kept. Bars are colored according to four major groups of the putative hosts (see common legend at the bottom-right).
Fig 3
Fig 3. Maximum likelihood phylogenies of non-structural and structural proteins and tanglegram of vertebrate nidoviruses.
The trees are based on protein alignments from which poorly conserved positions were manually removed. The phylogenies of non-structural proteins involving Coronaviridae and Tobaniviridae members (top) and Nangoshaviridae, Nanhypoviridae, Gresnaviridae, Olifoviridae and Arteriviridae members (bottom) are based on a concatenated alignment of RdRp, ZBD and HEL1 (A). The S protein phylogenies involving Coronaviridae and Tobaniviridae members (B) are based on conserved regions of the S2 part of the spike protein in coronaviruses or the homologous part in tobaniviruses. Two separate trees for A and three separate trees for B were constructed (see Materials and Methods for details). The branch lengths are in units of aa substitutions per site; scale bars are shown. White and black circles at internal nodes indicate branching support. Tips corresponding to reference viruses are shown as gray circles and those constituting lineages rediscovered or newly discovered from SRA data as blue and red circles, respectively. Family-like, subfamily-like and genus-like OTUs derived from a genetics-based classification using DEmARC are shown using dark gray, light gray and white rectangles, respectively; known or predicted host types are indicated by colored diamonds next to the virus names; viruses with bisegmented genomes, inferred recombinant S2 and those expressing a putative glycosidase domain are highlighted by colored squares (see legend at the bottom-right). Possible additional recombinant S2 cases are discussed in the text.
Fig 4
Fig 4. Genomic layout of novel coronaviruses and five reference viruses.
Viruses that don’t start with an accession number in their name are discovered in this study. Predicted open reading frames (ORFs) of at least 300 nucleotides in length are shown as white rectangles; ORFs are defined to start and end at a stop codon. Protein domains predicted via profile HMM are indicated in color; transmembrane helix (TMh), macrodomain (Macro), 3C-like protease (3CLpro), RNA-dependent RNA polymerase (RdRp), RdRp-associated nucleotidyltransferase (NiRAN), zinc-binding domain (ZBD), superfamily 1 helicase (HEL1), O-methyltransferase (OMT), lamina-associated polypeptide 1C-like protein (LAP1C), family 18 glycosidase (GH18), spike/glycoprotein (S/gp), matrix protein (M), nucleocapsid protein (N), US22 protein (US22). Domain borders are drawn according to the corresponding profile search hit and the actual domains may extend beyond these borders. Black bars above a genome indicate missing sequence.
Fig 5
Fig 5. Molecular validation of the 3’-termini of both segments of two bisegmented fish coronaviruses.
For each segment, a multiple nucleotide sequence alignment of the 3’-ends of the SRA-based contig (Original contig), selected additional strains from different fish specimens and the product of the 3’RACE PCR (red label) is shown. The corresponding Sanger sequencing chromatogram for the 3’RACE PCR is shown below each sequence alignment.
Fig 6
Fig 6. Sequence-based evidence for subgenomic RNA (sgRNA) formation in Crotalus viridis tobanivirus (A-D) and Eospalax fontanierii baileyi arterivirus (E-I).
Read depth from SRR7401987 (A) and SRR3036364 (E) across the reconstructed virus genomes. (B,F) Inferred reconstruction of viral sgRNAs based on leader-body-junction reads, with the positions of putative transcription regulatory sequences (TRSs) indicated with triangles in the same color as the nearest downstream gene; in cases where multiple body TRSs are used, multiple RNA species are shown. (C,H) Inferred TRSs are shown in colors corresponding to the nearest downstream gene, including distance to the start codon. (D,I) Sequence and read count of sgRNAs showing leader-body fusion; leader sequences are shown in purple, sequences matching the leader TRS in maroon, and sequences from the sgRNA body region in black. (G) Homologs of Eospalax fontanierii baileyi arterivirus structural proteins, inferred from HHpred search against the PFAM-A_v35 database. The best statistical match for each protein and corresponding E-values (HHpred e) are shown.

Similar articles

Cited by

References

    1. de Groot RJ, Cowley JA, Enjuanes L, Faaberg KS, Perlman S, Rottier PJM, et al.. Order Nidovirales. King AMQ, Adams MJ, Carstens EB, Lefkowitz EJ (editors) Virus Taxonomy, Ninth Report of the International Committee on Taxonomy of Viruses. Amsterdam: Elsevier Academic Press; 2012. pp. 785–795.
    1. Siddell SG, Walker PJ, Lefkowitz EJ, Mushegian AR, Adams MJ, Dutilh BE, et al.. Additional changes to taxonomy ratified in a special vote by the International Committee on Taxonomy of Viruses (October 2018). Arch Virol. 2019;164: 943–946. doi: 10.1007/s00705-018-04136-2 - DOI - PubMed
    1. Walker PJ, Siddell SG, Lefkowitz EJ, Mushegian AR, Adriaenssens EM, Alfenas-Zerbini P, et al.. Recent changes to virus taxonomy ratified by the International Committee on Taxonomy of Viruses (2022). Arch Virol. 2022;167: 2429–2440. doi: 10.1007/s00705-022-05516-5 - DOI - PMC - PubMed
    1. Drosten C, Günther S, Preiser W, van der Werf S, Brodt H-R, Becker S, et al.. Identification of a novel coronavirus in patients with severe acute respiratory syndrome. N Engl J Med. 2003;348: 1967–1976. doi: 10.1056/NEJMoa030747 - DOI - PubMed
    1. Zaki AM, van Boheemen S, Bestebroer TM, Osterhaus ADME, Fouchier RAM. Isolation of a novel coronavirus from a man with pneumonia in Saudi Arabia. N Engl J Med. 2012;367: 1814–1820. doi: 10.1056/NEJMoa1211721 - DOI - PubMed

Publication types