Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 May 7;13(5):evab087.
doi: 10.1093/gbe/evab087.

Mutation Rates and Selection on Synonymous Mutations in SARS-CoV-2

Affiliations

Mutation Rates and Selection on Synonymous Mutations in SARS-CoV-2

Nicola De Maio et al. Genome Biol Evol. .

Abstract

The COVID-19 pandemic has seen an unprecedented response from the sequencing community. Leveraging the sequence data from more than 140,000 SARS-CoV-2 genomes, we study mutation rates and selective pressures affecting the virus. Understanding the processes and effects of mutation and selection has profound implications for the study of viral evolution, for vaccine design, and for the tracking of viral spread. We highlight and address some common genome sequence analysis pitfalls that can lead to inaccurate inference of mutation rates and selection, such as ignoring skews in the genetic code, not accounting for recurrent mutations, and assuming evolutionary equilibrium. We find that two particular mutation rates, G →U and C →U, are similarly elevated and considerably higher than all other mutation rates, causing the majority of mutations in the SARS-CoV-2 genome, and are possibly the result of APOBEC and ROS activity. These mutations also tend to occur many times at the same genome positions along the global SARS-CoV-2 phylogeny (i.e., they are very homoplasic). We observe an effect of genomic context on mutation rates, but the effect of the context is overall limited. Although previous studies have suggested selection acting to decrease U content at synonymous sites, we bring forward evidence suggesting the opposite.

Keywords: COVID-19; SARS-CoV-2; mutation; selection; sequencing; viral genomics.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Numbers of possible mutations, observed mutations, and sites with alternative alleles. On the X axes are the 12 distinct types of mutation events, A →C, A →G, etc. In green, we always show the number of genome positions at which the considered mutation type is possible. In (A) and (C), we consider all possible mutations, whereas in (B) and (D), we consider only synonymous mutations. In (A) and (B) we show, on the Y axis, the numbers of sites with alternative alleles in the alignment (blue color hues). Note that Y axis scales differ among plots. In dark blue, we show the number of all sites with alternative variants of the given type; in blue, we only show the number of such sites at which the alternative variant is present in at least two sequences; in light blue, only sites at which the considered alternative allele is present in at least five sequences. By definition, in plots (A) and (B) green bars are necessarily taller than all blue ones. In (C) and (D) we show, in red, orange and yellow, the numbers of mutation events inferred with parsimony on our phylogeny. In red we show the number of mutation events of the considered type with exactly one descendant; in orange the number of these mutations with at least two but less than five descendants; in yellow, those with at least five descendants. Mutation possibilities (green) can be fewer than inferred mutations events (red, orange and yellow in plots C and D) for certain types of mutations since the same mutation event can be inferred multiple times at the same site in different parts of the phylogenetic tree.
Fig. 2.
Fig. 2.
Reoccurrence of mutation events at the same sites. Proportion of sites (Y axis) where a given mutation (color, see legends) appears a certain number of times (X axis) along the phylogeny. (A) synonymous sites; (B) nonsynonymous sites.
Fig. 3.
Fig. 3.
Estimated synonymous mutation rates in SARS-CoV-2. To estimate synonymous mutation rates in SARS-CoV-2, we used the counts of inferred synonymous mutation events (see fig 1D) normalized by the numbers of reference genome sites at which such mutations might have occurred. On the X axis are the 12 distinct types of mutation events, A →C, A →G, etc. In red, orange and yellow we show respectively rates obtained from counts of mutation events with one descendant, more than one but less than five descendant, and five or more descendants. (A) Mutation rates represented as average numbers of mutation events inferred per site at which such mutation type is possible. (B) Relative mutation rates (the sum of all bars of one specific color is 1.0).
Fig. 4.
Fig. 4.
C →U and G →U synonymous mutation rates in different base contexts. Here mutation rates are calculated as in figure 3A. (A) C →U mutations. (B) G →U mutations. The X axis shows the context of the considered mutation (e.g., in A, A_G represents the trinucleotide ACG and its synonymous mutation rate into trinucleotide AUG). Colors are as in figure 3.
Fig. 5.
Fig. 5.
Evidence of selection affecting the population frequency of synonymous versus nonsynonymous mutations. Counts and rate ratios of SARS-CoV-2 synonymous and nonsynonymous mutations at different frequencies in the human population. (A) Counts of possible mutations (green), singleton mutations (red), mutations with >1 and 4 descendants (orange), and mutations with >4 descendants (yellow). (B) Ratios of higher versus lower frequency mutation rates. In the absence of selection, ratios should not be significantly different between the classes of synonymous and nonsynonymous mutations. Instead, we measure a significant deviation in each comparison, with nonsynonymous mutations being relatively depleted of high frequency mutations. We calculated P values using the chi2_contingency function of the Scipy.stats package (Virtanen et al. 2020). On the X axis, “single” refers to mutations with one descendant, “low” to mutations with 2–4 descendants, and “high” to mutations with >4 descendants. For example, “high versus single” refers to the comparison of rate of mutations with >4 descendants versus the rate of mutations with one descendant.
Fig. 6.
Fig. 6.
Test of selection affecting U content at synonymous sites. Values are the same as in figure 5, but this time we focus on synonymous mutations that decrease U content (“U”), or leave it unaltered (“=U”). Only P values below 0.1 are shown.

Update of

References

    1. Alexandrov LB, et al.; Australian Pancreatic Cancer Genome Initiative. 2013. Signatures of mutational processes in human cancer. Nature 500(7463):415–421. - PMC - PubMed
    1. Amanat F, Krammer F.. 2020. SARS-CoV-2 vaccines: status report. Immunity 52(4):583–589. - PMC - PubMed
    1. Cagliani R, Forni D, Clerici M, Sironi M.. 2020. Computational inference of selection underlying the evolution of the novel coronavirus, severe acute respiratory syndrome coronavirus 2. J Virol. 94(12): - PMC - PubMed
    1. Clemente F, Vogl C.. 2012. Evidence for complex selection on four-fold degenerate sites in drosophila melanogaster. J Evol Biol. 25(12):2582–2595. - PubMed
    1. Cuevas JM, Domingo-Calap P, Sanjuán R.. 2012. The fitness effects of synonymous mutations in DNA and RNA viruses. Mol Biol Evol. 29(1):17–20. - PubMed