Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Sep 18;9(2):vead055.
doi: 10.1093/ve/vead055. eCollection 2023.

Fitness effects of mutations to SARS-CoV-2 proteins

Affiliations

Fitness effects of mutations to SARS-CoV-2 proteins

Jesse D Bloom et al. Virus Evol. .

Erratum in

Abstract

Knowledge of the fitness effects of mutations to SARS-CoV-2 can inform assessment of new variants, design of therapeutics resistant to escape, and understanding of the functions of viral proteins. However, experimentally measuring effects of mutations is challenging: we lack tractable lab assays for many SARS-CoV-2 proteins, and comprehensive deep mutational scanning has been applied to only two SARS-CoV-2 proteins. Here, we develop an approach that leverages millions of publicly available SARS-CoV-2 sequences to estimate effects of mutations. We first calculate how many independent occurrences of each mutation are expected to be observed along the SARS-CoV-2 phylogeny in the absence of selection. We then compare these expected observations to the actual observations to estimate the effect of each mutation. These estimates correlate well with deep mutational scanning measurements. For most genes, synonymous mutations are nearly neutral, stop-codon mutations are deleterious, and amino acid mutations have a range of effects. However, some viral accessory proteins are under little to no selection. We provide interactive visualizations of effects of mutations to all SARS-CoV-2 proteins (https://jbloomlab.github.io/SARS2-mut-fitness/). The framework we describe is applicable to any virus for which the number of available sequences is sufficiently large that many independent occurrences of each neutral mutation are observed.

Keywords: COVID-19; UShER; dN/dS; deep mutational scanning; fitness; mutation rate.

PubMed Disclaimer

Conflict of interest statement

J.D.B. consults Apriori Bio, Aerium Therapeutics, Invivyd, the Vaccine Company, GSK, and Pfizer on topics related to viral evolution. J.D.B. receives royalty payments as an inventor on Fred Hutch licensed patents related to deep mutational scanning of viral proteins.

Figures

Figure 1.
Figure 1.
Expected versus actual counts of mutations. (A) The number of expected counts of each type of nucleotide mutation is computed from four-fold degenerate sites, and then compared the actual counts of each mutation. (B) Expected versus actual counts for each nucleotide mutation type aggregated across all viral clades and averaged across all sites where the mutation is four-fold degenerate, synonymous (including four-fold degenerate), nonsynonymous, or introduces a stop codon. See https://jbloomlab.github.io/SARS2-mut-fitness/avg_counts.html for an interactive version of panel B that enables mouseovers to read off specific values.
Figure 2.
Figure 2.
Correlations of mutation fitness effect estimates made using subsets of natural sequences. Correlations between estimates made (A) just using sequences from the Delta or Omicron BA.5 clades or (B) just from the USA or England. Each point is an amino acid mutation, the orange line is a least-squares regression, and orange text at upper left shows the number of mutations and Pearson’s correlation coefficient. Only mutations with at least 10 expected counts are shown, which is why panels have different numbers of mutations shown (sequence subsets vary in size). Different subset size are also the reason why the regression line in (A) deviates from the identity x = y. (C) Correlations between clade or geography subsets become higher with an increasingly large threshold for minimum expected counts. Spike mutations have a worse correlation when subsetting by viral clade (plot shows average correlation over all pairwise combinations of Delta, BA.1, BA.2, and BA.5), but not when subsetting by geography (USA or England). (D) Correlations in estimated mutation-effects decline for clades with higher protein divergence, with the effect most noticeable for spike since spike is more diverged among SARS-CoV-2 clades than other viral proteins. See https://jbloomlab.github.io/SARS2-mut-fitness/clade_corr_chart.html and https://jbloomlab.github.io/SARS2-mut-fitness/subset_corr_chart.html  for versions of A and B that include all viral clades with at least 500,000 total expected counts (summed across all mutations) and have other interactive options.
Figure 3.
Figure 3.
Distribution of effects of different classes of mutations. (A) Histograms of effects of synonymous, nonsynonymous, and stop-codon mutations across all viral genes. Neutral mutations have effects of zero (dashed gray vertical lines), and deleterious mutations have negative effects. (B) Effects of each class of mutation for each viral gene. Dark squares indicate the median effect, and the lighter rectangles span the interquartile range. Mutation types are color-coded as in panel (A). The apparent constraint on synonymous mutations in ORF9b is probably because this gene is encoded in an overlapping reading frame with N (Jungreis et al. 2021). See https://jbloomlab.github.io/SARS2-mut-fitness/effects_histogram.html and https://jbloomlab.github.io/SARS2-mut-fitness/effects_dist.html for plots that allow adjustment of the expected-count cutoff and other interactive options (such as separate histograms for each gene). See Supplementary Fig. S3 for a version of panel B with genes ordered by genomic position rather than constraint on nonsynonymous mutations.
Figure 4.
Figure 4.
Correlation of mutation-effect estimates with experimental deep mutational scanning measurements for (A) the full spike (Dadonaite et al. 2023) or its RBD (Starr et al. 2022b), and (B) Mpro (Flynn et al. 2023; Iketani et al. 2022a). Each point is an amino acid mutation, the orange line is a least-squares regression, and orange text in the upper left shows the number of mutations and Pearson’s correlation coefficient. Each subpanel shows a different set of mutations (depending on which mutations were measured in that experiment). See https://jbloomlab.github.io/SARS2-mut-fitness/dms_S_corr.html and https://jbloomlab.github.io/SARS2-mut-fitness/dms_nsp5_corr.html for plots that also show the Mpro dataset from (Flynn et al. 2022) and have various interactive options. The plots in this figure show the average of the multiple phenotypes measured in the deep mutational scanning of Starr et al. (2022b); see https://jbloomlab.github.io/SARS2-mut-fitness/dms_S_all_corr.html for each phenotype separately. This figure only shows mutations with at least 20 expected counts, which is higher than the threshold of 10 used in most of the rest of this paper (this threshold can be adjusted in the interactive plots).
Figure 5.
Figure 5.
Effects of amino acid mutations to E protein. The area plot at top shows the average effects of mutations at each site, and the heatmap shows the effects of specific amino acids, with x denoting the amino acid identity in the Wuhan-Hu-1 strain. See https://jbloomlab.github.io/SARS2-mut-fitness/E.html for an interactive version of this plot that enables zooming, mouseovers, adjustment of the minimum expected count threshold, and layering of stop codon effects on the site plot. See https://jbloomlab.github.io/SARS2-mut-fitness for comparable interactive plots for all SARS-CoV-2 proteins.

Update of

Similar articles

Cited by

References

    1. Abdool Karim S. S. and T. de Oliveira (2021) ‘New SARS-CoV-2 variants—clinical, public health, and vaccine implications’, New England Journal of Medicine, 384: 1866–1868. - PMC - PubMed
    1. Acevedo A., L. Brodsky and R. Andino (2014) ‘Mutational and fitness landscapes of an RNA virus revealed through population sequencing’, Nature, 505: 686–690. - PMC - PubMed
    1. Aksamentov I. et al. (2021) ‘Nextclade: clade assignment, mutation calling and quality control for viral genomes’, Journal of Open Source Software, 6: 3773.
    1. Beale R. C. et al. (2004) ‘Comparison of the differential context-dependence of DNA deamination by APOBEC enzymes: correlation with mutation spectra in vivo’, Journal of Molecular Biology, 337: 585–596. - PubMed
    1. Bhatt P. R. et al. (2021) ‘Structural basis of ribosomal frameshifting during translation of the SARS-CoV-2 RNA genome’, Science, 372: 1306–1313. - PMC - PubMed