Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2024 Mar 26;25(7):3696.
doi: 10.3390/ijms25073696.

Properties and Mechanisms of Deletions, Insertions, and Substitutions in the Evolutionary History of SARS-CoV-2

Affiliations
Review

Properties and Mechanisms of Deletions, Insertions, and Substitutions in the Evolutionary History of SARS-CoV-2

Igor B Rogozin et al. Int J Mol Sci. .

Abstract

SARS-CoV-2 has accumulated many mutations since its emergence in late 2019. Nucleotide substitutions leading to amino acid replacements constitute the primary material for natural selection. Insertions, deletions, and substitutions appear to be critical for coronavirus's macro- and microevolution. Understanding the molecular mechanisms of mutations in the mutational hotspots (positions, loci with recurrent mutations, and nucleotide context) is important for disentangling roles of mutagenesis and selection. In the SARS-CoV-2 genome, deletions and insertions are frequently associated with repetitive sequences, whereas C>U substitutions are often surrounded by nucleotides resembling the APOBEC mutable motifs. We describe various approaches to mutation spectra analyses, including the context features of RNAs that are likely to be involved in the generation of recurrent mutations. We also discuss the interplay between mutations and natural selection as a complex evolutionary trend. The substantial variability and complexity of pipelines for the reconstruction of mutations and the huge number of genomic sequences are major problems for the analyses of mutations in the SARS-CoV-2 genome. As a solution, we advocate for the development of a centralized database of predicted mutations, which needs to be updated on a regular basis.

Keywords: ADAR; APOBEC; SARS-CoV-2; epistasis; low-complexity regions; mutation hotspots; oxidative stress; viral fitness.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflicts of interest.

Figures

Figure 1
Figure 1
Structure of the SARS-CoV-2 genome. The 5′-cap, UTR sequences, leader sequences (LSs), poly-A tail, and standard names of ORFs are shown. M, N, S, and E are structural proteins.
Figure 2
Figure 2
Typical Nextstrain tree with a detailed resolution for the January 2023—January 2024 time period. In total, 3213 out of 3972 sequences sampled between January 2023 and January 2024 have been used to reconstruct the tree by Nextstrain. Different colors on the phylogenetic tree correspond to names of SARS-CoV-2 strains shown at the upper left panel.
Figure 3
Figure 3
Substitution frequencies in SARS-CoV-2. The Y axis is the fraction of each predicted mutation type in 4-fold degenerate sites. Data are from [9].
Figure 4
Figure 4
Distribution of mutations across coding regions of the SARS-CoV-2 genome. The number of substitutions is shown for each of the 10 equal-length bins in the viral genome. Data are from [9].
Figure 5
Figure 5
Molecular mechanisms of deletions in the SARS-CoV-2 genome. (a) Template dislocation model for short deletions: one (or several) nucleotide deletions in short stretches of identical nucleotides or polynucleotides. (b) Template switch model for long deletions: deletion between direct repeats that includes removal of one repeat. Lowercase letters indicate deleted regions, direct repeats are shown by arrows. Data are from [30]. Circles correspond to nucleotides, empty and filled circles are used depending on the nature of repetitive sequences.
Figure 6
Figure 6
Molecular mechanisms of insertions in the SARS-CoV-2 genome. (a) Template dislocation model: one (or several) nucleotide insertions in short stretches of identical nucleotides or polynucleotides. Example of short insertions. (b) Duplications. (c) Template switch model for long insertions. Lowercase letters indicate flanking regions. Data are from [28]. Circles correspond to nucleotides, empty and filled circles are used depending on the nature of repetitive sequences.
Figure 7
Figure 7
Location of three insertion sites in the SARS-CoV-2 S protein affecting spike–IgV (immunoglobulin variable domain) binding surfaces. The spike protein is shown in magenta (PDB ID: 7cn8), while light (PDB ID: 7cl2) and heavy (PDB ID: 7cl2) chains of 4A8 antibody are in beige and blue, respectively. Sequences of insertions at positions 245, 246, and 248 are shown. The data are taken from [28]. The monosaccharide N-acetylglucosamine (NAG) molecules are shown at the surface of spike.
Figure 8
Figure 8
Sequences surrounding the CCTCGGCGGGCA insertion in the SARS-CoV-2 sequence. MN996532 is the closest bat homolog RaTG13; MG772934 is a more distantly related bat homolog. Asterisks indicate mismatches between SARS-CoV-2 and RaTG13. Letters above NC_045512 correspond to encoded amino acids.
Figure 9
Figure 9
Time-series plot of the Nextstrain entropy (a normalized Shannon entropy) for the PRRA/HRRA/LRRA inserted sequences.

References

    1. Drake J.W., Baltz R.H. The biochemistry of mutagenesis. Annu. Rev. Biochem. 1976;45:11–37. doi: 10.1146/annurev.bi.45.070176.000303. - DOI - PubMed
    1. Maki H. Origins of spontaneous mutations: Specificity and directionality of base-substitution, frameshift, and sequence-substitution mutageneses. Annu. Rev. Genet. 2002;36:279–303. doi: 10.1146/annurev.genet.36.042602.094806. - DOI - PubMed
    1. Rogozin I.B., Pavlov Y.I. Theoretical analysis of mutation hotspots and their DNA sequence context specificity. Mutat. Res. 2003;544:65–85. doi: 10.1016/S1383-5742(03)00032-2. - DOI - PubMed
    1. Drake J.W., Charlesworth B., Charlesworth D., Crow J.F. Rates of spontaneous mutation. Genetics. 1998;148:1667–1686. doi: 10.1093/genetics/148.4.1667. - DOI - PMC - PubMed
    1. Rogozin I.B., Pavlov Y.I., Goncearenco A., De S., Lada A.G., Poliakov E., Panchenko A.R., Cooper D.N. Mutational signatures and mutable motifs in cancer genomes. Brief. Bioinform. 2018;19:1085–1101. doi: 10.1093/bib/bbx049. - DOI - PMC - PubMed