Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2021 Jun 20;10(6):1557.
doi: 10.3390/cells10061557.

From RNA World to SARS-CoV-2: The Edited Story of RNA Viral Evolution

Affiliations
Review

From RNA World to SARS-CoV-2: The Edited Story of RNA Viral Evolution

Zachary W Kockler et al. Cells. .

Abstract

The current SARS-CoV-2 pandemic underscores the importance of understanding the evolution of RNA genomes. While RNA is subject to the formation of similar lesions as DNA, the evolutionary and physiological impacts RNA lesions have on viral genomes are yet to be characterized. Lesions that may drive the evolution of RNA genomes can induce breaks that are repaired by recombination or can cause base substitution mutagenesis, also known as base editing. Over the past decade or so, base editing mutagenesis of DNA genomes has been subject to many studies, revealing that exposure of ssDNA is subject to hypermutation that is involved in the etiology of cancer. However, base editing of RNA genomes has not been studied to the same extent. Recently hypermutation of single-stranded RNA viral genomes have also been documented though its role in evolution and population dynamics. Here, we will summarize the current knowledge of key mechanisms and causes of RNA genome instability covering areas from the RNA world theory to the SARS-CoV-2 pandemic of today. We will also highlight the key questions that remain as it pertains to RNA genome instability, mutations accumulation, and experimental strategies for addressing these questions.

Keywords: ADAR; APOBEC; RNA editing; RNA world theory; genome stability; hypermutation; messenger RNA; viral RNA; viral evolution.

PubMed Disclaimer

Conflict of interest statement

Both listed Authors have declared no competing interests.

Figures

Figure 1
Figure 1
RNA virus genome type and mode of replication define mutation strand bias in the progeny. Presented are the modes of RNA viral genome replication and how mutagenesis with the ssRNA-specific cytidine deaminase APOBEC and the dsRNA-specific adenosine deaminase ADAR affect the genomes of the cell. Positive (+) strands are shown in blue. Negative (−) strands are shown in orange. Color codes, same as a strand color, are assigned to nucleotides that will be mutated in the next steps. APOBEC mutagenesis and resulting mutant nucleotides are shown in green. ADAR mutagenesis in dsRNA and resulting mutant nucleotides are shown in purple. Nucleotides that stayed not mutated in the progeny are shown in black. Predominant classes of mutations in progeny ssRNA genomes or in coding (+) strand RNA of dsRNA genomes is shown in boxes. (A) Viruses with positive (+) ssRNA genome. The infecting (+) strand RNA genome is used as a template by RNA dependent RNA polymerase (RdRp) to synthesize a dsRNA with both (+) and (−) strands. A single dsRNA molecule is subsequently used to generate multiple copies of (+) strand RNA transcripts and/or genomes. A single APOBEC-induced C to U change in the infecting genomic (+) strand ssRNA would amplify in all viral progeny (C to U mutations). An ADAR-induced A to I (inosine) change in the (+) strand dsRNA would not reproduce in genomes of viral progeny. In contrast, an ADAR-induced A to I change in the (-) strand dsRNA would be copied into multiple (+) strand RNA transcripts and thus be amplified in the viral progeny as U to C mutations in genomic (+) strand ssRNA. (B) Viruses with double-stranded (ds) RNA genomes. Multiple (+) ssRNA transcripts and/or genome precursors are generated by RdRp. Each (+) ssRNAs precursor is then used to generate a dsRNA genome. Only ADAR-induced A to I mutations in (−) strand are amplified into multiple dsRNA genomes via copies of (+) strands. Since there are multiple (+) strand intermediates, there is a chance of detectable level of C to U APOBEC-induced deamination in a fraction of (+) strands. (C) Viruses with negative (−) ssRNA genomes. Several (+) ssRNA transcripts and/or precursors of (−) ssRNA genomes are generated by RdRp that are then used to generate multiple (−) ssRNA genomes. (−) ssRNA genomes of infecting particles as well (+) ssRNA precursors can serve as a substrate for APOBEC mutagenesis. The change (C to U or G to A) recovered by sequencing progeny genomes would be defined by the strand which is deaminated by APOBEC. Multiple C to U mutant molecules will arise from a single deamination in the infecting (−) ssRNA genome. Smaller number of G to A changes would result from each deamination event in a (+) strand precursor, but since there may be multiple precursor copies (shown in the multiple columns), a number of these changes may be comparable with C to U changes.
Figure 2
Figure 2
Viral RNA recombination. (A) Replicative recombination begins after incomplete RNA replication resulting in the dissociation from the template and rebinding with another RNA molecule to complete replication. (i) RNA template rebinding at a homologous location in an identical RNA template results in error-free recombination. (ii) RNA template rebinding in an ectopic RNA molecule creates a chimeric molecule. (B) Non-Replicative RNA recombination occurs through a yet unknown mechanism, which can involve breakage and joining of two different RNA molecules to create a chimeric RNA molecule.
Figure 3
Figure 3
Enzymatic deamination of RNA nucleosides. (A) APOBEC cytidine deaminase. Deamination of cytidine in ssRNA generates uridine resulting in C→U mutation in the RNA virus genome. (B) ADAR adenosine deaminase. Deamination of adenosine in dsRNA or in folded and paired ssRNA (forming dsRNA) generates inosine, which after rounds of copying with RdRp is fixed as A→G mutation.
Figure 4
Figure 4
Simplified schematic of mutations accumulation in virus population. Mutations are identified by comparing a sequence of a viral isolate with a reference sequence (RefSeq). Individual positions where bases are mutated in at least one isolate are shown by rectangles. Blue rectangles are positions same as in RefSeq. g0—A genome of virus quasispecies starting a population that may already have some differences from RefSeq. g1—g9 rounds of replication generating additional mutations, which are numbered same as the generation in which a mutation event had occurred. Mutations occurring in later generations would be present in smaller fractions (reflected by the decreasing yellow color density) within the population. The entire set of independent mutation events would be described by the list in which every mutation is represented only once, regardless of the number of genomes where it is found. In this population, such a list is represented by the g9 genome.
Figure 5
Figure 5
Trinucleotide motif-centered RNA mutational signature analysis. Shown is an example of an analysis for calculating the enrichment (E) and signature-associated mutation load (Sign Load) of uCa→uUa signature motifs in ssRNA, which are the two main outputs of trinucleotide motif-centered mutational signature analysis. Reverse complements are not included. Counted are all C→U mutations as well as all trinucleotide motif uCa→uUa mutations (5′ and 3′ flanking nucleotides shown in small letters; mutated C shown in capital letters). Also counted are all cytosines (c), represented in blue, and all motif-conforming trinucleotides (uca), represented in orange, in 41 nucleotide contexts centered around mutated cytosines. (E) values show the fold-difference between actual fraction of uCa→uUa mutations among C→U mutations in all trinucleotide motifs and the fraction of motif conforming trinucleotides (uca) among all cytosines (c) in the immediate vicinity of mutated cytosines. Counts used for enrichment calculation can be also used for calculating p-values in order to identify trinucleotide mutational motifs with statistically significant enrichment. Statistically significant enrichment values can be used for minimum estimates of a (Sign Load).

References

    1. Dobzhansky T. Nothing in Biology Makes Sense except in the Light of Evolution. Am. Biol. Teach. 1973;35:125–129. doi: 10.2307/4444260. - DOI
    1. Dobzhansky T. Genetics and the Origin of Species/Theodosius Dobzhansky. 3rd ed. Columbia University Press; New York, NY, USA: 1951.
    1. Domingo E. Virus as Populations. Academic Press; Boston, MA, USA: 2016. Molecular Basis of Genetic Variation of Viruses; pp. 35–71. - DOI
    1. Domingo E., Perales C. Viral quasispecies. PLoS Genet. 2019;15:e1008271. doi: 10.1371/journal.pgen.1008271. - DOI - PMC - PubMed
    1. Domingo E., Sabo D., Taniguchi T., Weissmann C. Nucleotide sequence heterogeneity of an RNA phage population. Cell. 1978;13:735–744. doi: 10.1016/0092-8674(78)90223-4. - DOI - PubMed

Publication types

LinkOut - more resources