Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Nov 27;9(6):e02329-18.
doi: 10.1128/mBio.02329-18.

Origins and Evolution of the Global RNA Virome

Affiliations

Origins and Evolution of the Global RNA Virome

Yuri I Wolf et al. mBio. .

Abstract

Viruses with RNA genomes dominate the eukaryotic virome, reaching enormous diversity in animals and plants. The recent advances of metaviromics prompted us to perform a detailed phylogenomic reconstruction of the evolution of the dramatically expanded global RNA virome. The only universal gene among RNA viruses is the gene encoding the RNA-dependent RNA polymerase (RdRp). We developed an iterative computational procedure that alternates the RdRp phylogenetic tree construction with refinement of the underlying multiple-sequence alignments. The resulting tree encompasses 4,617 RNA virus RdRps and consists of 5 major branches; 2 of the branches include positive-sense RNA viruses, 1 is a mix of positive-sense (+) RNA and double-stranded RNA (dsRNA) viruses, and 2 consist of dsRNA and negative-sense (-) RNA viruses, respectively. This tree topology implies that dsRNA viruses evolved from +RNA viruses on at least two independent occasions, whereas -RNA viruses evolved from dsRNA viruses. Reconstruction of RNA virus evolution using the RdRp tree as the scaffold suggests that the last common ancestors of the major branches of +RNA viruses encoded only the RdRp and a single jelly-roll capsid protein. Subsequent evolution involved independent capture of additional genes, in particular, those encoding distinct RNA helicases, enabling replication of larger RNA genomes and facilitating virus genome expression and virus-host interactions. Phylogenomic analysis reveals extensive gene module exchange among diverse viruses and horizontal virus transfer between distantly related hosts. Although the network of evolutionary relationships within the RNA virome is bound to further expand, the present results call for a thorough reevaluation of the RNA virus taxonomy.IMPORTANCE The majority of the diverse viruses infecting eukaryotes have RNA genomes, including numerous human, animal, and plant pathogens. Recent advances of metagenomics have led to the discovery of many new groups of RNA viruses in a wide range of hosts. These findings enable a far more complete reconstruction of the evolution of RNA viruses than was attainable previously. This reconstruction reveals the relationships between different Baltimore classes of viruses and indicates extensive transfer of viruses between distantly related hosts, such as plants and animals. These results call for a major revision of the existing taxonomy of RNA viruses.

Keywords: RNA virus; RNA-dependent RNA polymerase; capsid protein; evolution; virome.

PubMed Disclaimer

Figures

FIG 1
FIG 1
Phylogeny of RNA virus RNA-dependent RNA polymerases (RdRps) and reverse transcriptases (RTs): the main branches (branches 1 to 5). Each branch represents collapsed sequences of the corresponding set of RdRps. The 5 main branches discussed in the text are labeled accordingly. The bootstrap support values obtained by the indicated numerator/denominator calculations are shown for each internal branch. LTR, long-terminal repeat.
FIG 2
FIG 2
Branch 1 of the RNA virus RNA-dependent RNA polymerases (RdRps): leviviruses and their relatives. (A) Phylogenetic tree of the virus RdRps showing ICTV-accepted virus taxa and other major groups of viruses. Approximate numbers of distinct virus RdRps present in each branch are shown in parentheses. Symbols to the right of the parentheses summarize the presumed virus host spectrum of a lineage. Green dots represent well-supported (≥0.7) branches. (B) Genome maps of a representative set of branch 1 viruses (drawn to scale) showing color-coded major conserved domains. Where a conserved domain comprises only a part of the larger protein, the rest of this protein is shown in light gray. The locations of such domains are approximated (indicated by fuzzy boundaries). CP, capsid protein; MP, movement protein; S3H, superfamily 3 helicase; SJR1 and SJR2, single jelly-roll capsid proteins of type 1 and type 2 (see Fig. 7).
FIG 3
FIG 3
Branch 2 of the RNA virus RNA-dependent RNA polymerases (RdRps): “picornavirus supergroup” of the +RNA viruses expanded to include nidoviruses and two groups of dsRNA viruses, partitiviruses, and picobirnaviruses. (A) Phylogenetic tree of the virus RdRps showing ICTV accepted virus taxa and other major groups of viruses. Approximate numbers of distinct virus RdRps present in each branch are shown in parentheses. Symbols to the right of the parentheses summarize the presumed virus host spectrum of a lineage. Green dots represent well-supported (≥0.7) branches. Inv., viruses of invertebrates (many found in holobionts, making host assignment uncertain); myco., mycoviruses; Uncl., unclassified; vert., vertebrate. (B) Genome maps of a representative set of branch 2 viruses (drawn to scale) showing color-coded major conserved domains. Where a conserved domain comprises only a part of the larger protein, the rest of this protein is shown in light gray. The locations of such domains are approximated (indicated by fuzzy boundaries). 3Cpro, 3C chymotrypsin-like protease; CP, capsid protein; E, envelope protein; En, nidoviral uridylate-specific endoribonuclease (NendoU); Exo, 3′-to-5′ exoribonuclease domain; fCP, capsid protein forming filamentous virions; M, membrane protein; MD, macrodomain; MP, movement protein; MT, ribose-2-O-methyltransferase domain; N, nucleocapsid protein; N7, guanine-N7-methyltransferase; Ppro, papain-like protease; SJR1 and SJR2, single jelly-roll capsid proteins of type 1 and type 2; spike, spike protein; S1H, superfamily 1 helicase; S2H, superfamily 2 helicase; S3H, superfamily 3 helicase; VP2, virion protein 2; Z, Zn-finger domain; Spro, serine protease; P3, protein 3. Distinct hues of same color (e.g., green for MPs) are used to indicate cases where proteins that share analogous function are not homologous.
FIG 4
FIG 4
Branch 3 of the RNA virus RNA-dependent RNA polymerases (RdRps): alphavirus superfamily and radiation of related tombusviruses, nodaviruses, and unclassified viruses and flavivirus supergroup. (A) Phylogenetic tree of the virus RdRps showing ICTV-accepted virus taxa and other major groups of viruses. Approximate numbers of distinct virus RdRps present in each branch are shown in parentheses. Symbols to the right of the parentheses summarize the presumed virus host spectrum of a lineage. Green dots represent well-supported (≥0.7) branches, whereas yellow dots correspond to weakly supported branches. Inv., viruses of invertebrates (many found in holobionts, making host assignment uncertain); myco., mycoviruses; uncl., unclassified. (B) Genome maps of a representative set of branch 3 viruses (drawn to scale) showing color-coded major conserved domains. Where a conserved domain comprises only a part of the larger protein, the rest of this protein is shown in light gray. The locations of such domains are approximated (indicated by fuzzy boundaries). C, nucleocapsid protein; CapE, capping enzyme; CP-Spro, capsid protein-serine protease; E, envelope protein; fCP, divergent copies of the capsid protein forming filamentous virions; Hsp70h, Hsp70 homolog; MP, movement protein; NS, nonstructural protein; nsP2 to nsP3, nonstructural proteins; Ppro, papain-like protease; prM, precursor of membrane protein; rCP, capsid protein forming rod-shaped virions; RiS, RNA interference suppressor; S1H, superfamily 1 helicase; S2H, superfamily 2 helicase; SJR2, single jelly-roll capsid proteins of type 2; Spro, serine protease; vOTU, virus OTU-like protease; NS, nonstructural protein. Distinct hues of same color (e.g., green for MPs) are used to indicate the cases when proteins that share analogous function are not homologous.
FIG 5
FIG 5
Branch 4 of the RNA virus RNA-dependent RNA polymerases (RdRps): dsRNA viruses of eukaryotes and prokaryotes. (A) Phylogenetic tree of the virus RdRps showing ICTV-accepted virus taxa and other major groups of viruses. Approximate numbers of distinct virus RdRps present in each branch are shown in parentheses. Symbols to the right of the parentheses summarize the presumed virus host spectrum of a lineage. Inv., viruses of invertebrates (many found in holobionts, making host assignment uncertain); myco., mycoviruses; uncl., unclassified. Green dots represent well-supported (≥0.7) branches. (B) Genome maps of a representative set of branch 4 viruses (drawn to scale) showing color-coded major conserved domains. Where a conserved domain comprises only a part of the larger protein, the rest of this protein is shown in light gray. The locations of such domains are approximated (indicated by fuzzy boundaries). CapE, capping enzyme; CP, capsid protein; iCP, internal capsid protein; NS, nonstructural protein; NTPase, nucleotide triphosphatase; oCP, outer capsid protein; P, protein; phytoreoS7, homolog of S7 domain of phytoreoviruses; pHel, packaging helicase; vOTU, virus OTU-like protease; VP, viral protein. The CPs of totiviruses and chrysoviruses are homologous to iCPs of reoviruses and cystoviruses (black rectangles).
FIG 6
FIG 6
Branch 5 of the RNA virus RNA-dependent RNA polymerases (RdRps): −RNA viruses. (A) Phylogenetic tree of the virus RdRps showing ICTV-accepted virus taxa and other major groups of viruses. Approximate numbers of distinct virus RdRps present in each branch are shown in parentheses. Symbols to the right of the parentheses summarize the presumed virus host spectrum of a lineage. Green dots represent well-supported (≥0.7) branches, whereas yellow dots correspond to weakly supported branches. Inv., viruses of invertebrates (many found in holobionts, making host assignment uncertain); uncl., unclassified. (B) Genome maps of a representative set of branch 5 viruses (drawn to scale) showing color-coded major conserved domains. Where a conserved domain comprises only a part of the larger protein, the rest of this protein is shown in light gray. The locations of such domains are approximated (indicated by fuzzy boundaries). CapE, capping enzyme; CP, capsid protein; EN, “cap-snatching” endonuclease; GP, glycoprotein; GPC, glycoprotein precursor; HA, hemagglutinin; M, matrix protein; MP, movement protein; NA, neuraminidase; NP, nucleoprotein; NS, nonstructural protein; NSM, medium nonstructural protein; NSs, small nonstructural protein; PA, polymerase acidic protein; PB, polymerase basic protein; vOTU, virus OTU-like protease; VP, viral protein; Z, zinc finger protein.
FIG 7
FIG 7
Sequence similarity networks of SJR-CPs. Protein sequences were clustered by the pairwise similarity of their hmm profiles using CLANS. Different groups of SJR-CPs are shown as clouds of differentially colored circles, with the corresponding subgroups labeled as indicated in the figure. Edges connect sequences with CLANS P values of ≤1e−03.
FIG 8
FIG 8
Bipartite network of gene sharing in RNA viruses. (A) Groups of related viruses were identified as the modules of the bipartite genome-gene network (not shown), whereas connector genes were defined as those genes present in two or more modules with prevalence greater than 65%. The network presented in panel A shows viral modules as colored circles (blue, +RNA viruses; green, dsRNA viruses; red, −RNA viruses) linked to the connector genes (black dots) that are present in each module. The size of the circles is proportional to the number of genomes in each module. Shaded ovals indicate statistically significant, first-order supermodules that join modules from taxonomically related groups. (B) Taxonomic analysis of the network modules confirms that most modules contain viruses from a single family and that families do not tend to split among modules. (C) High-order supermodules of the RNA virus network, obtained by iteratively applying a community detection algorithm on the bipartite network of (super)modules and connector genes. GP, glycoprotein; GT, guanylyltransferase; MT, methyltransferase; NCP, nucleocapsid; RdRp, RNA-dependent RNA polymerase; SF, superfamily; SJR-CP, single jelly-roll capsid protein.
FIG 9
FIG 9
Quantitative analysis of the host range diversity of RNA viruses. The entropy of host ranges is plotted against the ultrameterized tree depth for the 5 main branches of the RdRp phylogeny (see Fig. 1).
FIG 10
FIG 10
A general scenario of RNA virus evolution. The figure is a rough scheme of the key steps of RNA virus evolution inferred in this work. The main branches from the phylogenetic tree of the RdRps are denoted with the numbers 1 to 5 as described in the Fig. 1 legend. Only the genes corresponding to RdRp, CP, and helicases (S1H, S2H, and S3H for the helicases of superfamilies 1, 2, and 3, respectively) are shown systematically. The helicases appear to have been captured independently and in parallel in 3 branches of +RNA viruses, facilitating the evolution of larger, more complex genomes. Additional genes, namely, the Endo and Exo genes (for endonuclease and exonuclease, respectively) and the Hsp70h gene (heat shock protein 70 homolog), are shown selectively, to emphasize the increased genome complexity, respectively, in Nidovirales and in Closteroviridae. The virion architecture is shown schematically for each included group of viruses. Icosahedral capsids composed of unelated CPs are indicated by different colors (see the text for details). The question mark at the hypothetical ancestral eukaryotic RNA virus indicates the uncertainty with regard to the nature of the host (prokaryotic or eukaryotic) of this ancestral form. The block arrow at the bottom indicates the time flow and the complexification trend in RNA virus evolution.

Comment in

Similar articles

Cited by

References

    1. Bernhardt HS. 2012. The RNA world hypothesis: the worst theory of the early evolution of life (except for all the others)a. Biol Direct 7:23. doi:10.1186/1745-6150-7-23. - DOI - PMC - PubMed
    1. Gilbert W. 1986. Origin of life - the RNA world. Nature 319:618–618. doi:10.1038/319618a0. - DOI
    1. Nelson JW, Breaker RR. 2017. The lost language of the RNA world. Sci Signal 10:eaam8812. doi:10.1126/scisignal.aam8812. - DOI - PMC - PubMed
    1. Koonin EV, Senkevich TG, Dolja VV. 2006. The ancient virus world and evolution of cells. Biol Direct 1:29. doi:10.1186/1745-6150-1-29. - DOI - PMC - PubMed
    1. Baltimore D. 1971. Expression of animal virus genomes. Bacteriol Rev 35:235–241. - PMC - PubMed

Publication types

Substances