Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2022 Feb 16:2021.09.07.21263228.
doi: 10.1101/2021.09.07.21263228.

Analysis of 6.4 million SARS-CoV-2 genomes identifies mutations associated with fitness

Affiliations

Analysis of 6.4 million SARS-CoV-2 genomes identifies mutations associated with fitness

Fritz Obermeyer et al. medRxiv. .

Update in

Abstract

Repeated emergence of SARS-CoV-2 variants with increased fitness necessitates rapid detection and characterization of new lineages. To address this need, we developed PyR 0 , a hierarchical Bayesian multinomial logistic regression model that infers relative prevalence of all viral lineages across geographic regions, detects lineages increasing in prevalence, and identifies mutations relevant to fitness. Applying PyR 0 to all publicly available SARS-CoV-2 genomes, we identify numerous substitutions that increase fitness, including previously identified spike mutations and many non-spike mutations within the nucleocapsid and nonstructural proteins. PyR 0 forecasts growth of new lineages from their mutational profile, identifies viral lineages of concern as they emerge, and prioritizes mutations of biological and public health concern for functional characterization.

One sentence summary: A Bayesian hierarchical model of all SARS-CoV-2 viral genomes predicts lineage fitness and identifies associated mutations.

PubMed Disclaimer

Conflict of interest statement

Authors have no competing interests.

Figures

Figure 1.
Figure 1.
A. Overview of the PyR0 analysis pipeline. After clustering UShER’s mutation annotated tree, sequence data are used to construct spatio-temporal lineage prevalence counts ytpc and amino acid substitution covariates Xcf. Pyro is used to fit a Bayesian multivariate logistic multinomial regression model to ytpc and Xcf. B. Relative fitness versus date of lineage emergence. Circle size is proportional to cumulative case count inferred from lineage proportion estimates and confirmed case counts. Inset table lists the 10 fittest lineages inferred by the model. R/RA is the fold increase in relative fitness over the Wuhan (A) lineage, assuming a fixed generation time of 5.5 days.
Figure 2.
Figure 2.
A. Infectivity relative to WT of lentiviral vectors pseudotyped with the indicated Spike mutants. Target cells were HEK293T cells expressing ACE2 and TMPRSS2 transgenes. The genetic background of the Spike was Wuhan-Hu-1 bearing D614G. Red bars were significantly different from WT (adjusted p values shown). Black bars were not significantly different from WT. B. For the 1701 SARS-CoV-2 clusters with at least one amino acid substitution in the RBD domain we compare: i) the PyR0 prediction for the contribution to Δ log R from RBD substitutions only; to ii) antibody binding computed using the antibody-escape calculator in (17). The escape calculator is based on an intuitive non-linear model parameterized using deep mutational scanning data for 33 neutralizing antibodies elicited by SARS-CoV-2. PyR0 predictions exhibit high (Spearman) correlation with predictions from Greaney et al. C-E. We dissect PyR0 Δ log R estimates into S-gene (C), RBD (D), and non-S-gene (E) contributions for 3000 SARS-CoV-2 clusters (blue dots). The horizontal axis corresponds to the date at which each cluster first emerged. Red squares denote the median Δ log R within each monthly bin. The increased importance of S-gene mutations (notably in the RBD) over non-S-gene mutations starting around November 2021 is apparent.
Figure 3.
Figure 3.
Manhattan plot of amino acid changes assessed in this study. A. Changes across the entire genome. B. Changes in the first 850 amino acids of S. In each of A-C the y axis shows effect size Δ log R, the estimated change in log relative fitness due to each amino acid change. The bottom three axes show the background density of all observed amino acid changes, the density of those associated with growth (weighted by |Δ log R|), and the ratio of the two. The top 55 amino acid changes are labeled. See Figure S13 for detailed views of S, N, ORF1a, and ORF1b. C. Changes in the first 250 amino acids of N. D. Structure of the spike-ACE2 complex (PDB: 7KNB). Spike subunits colored light blue, light orange, and gray. Top-ranked mutations are shown as red spheres. ACE2 is shown in magenta. E. Close-up view of the RBD interface. F. Top-ranked mutations in the N-terminal RNA-binding domain of N. Residues 44–180 of N (PDB: 7ACT) are shown in light blue. Amino acid positions corresponding to top mutations in this region are shown as red spheres. A 10-nt bound RNA is shown in gray.

References

    1. Davies N. G., Abbott S., Barnard R. C., Jarvis C. I., Kucharski A. J., Munday J. D., Pearson C. A. B., Russell W., Tully D. C., Washburne A. D., Wenseleers T., Gimma A., Waites W., Wong K. L. M., van Zandvoort K., Silverman J. D., CMMID COVID-19 Working Group, COVID-19 Genomics UK (COG-UK) Consortium, Diaz-Ordaz K., Keogh R., Eggo R. M., Funk S., Jit M., Atkins K. E., Edmunds W. J., Estimated transmissibility and impact of SARS-CoV-2 lineage B.1.1.7 in England. Science. 372 (2021), doi: 10.1126/science.abg3055. - DOI - PMC - PubMed
    1. Volz E., Mishra S., Chand M., Barrett J. C., Johnson R., Geidelberg L., Hinsley W. R., Laydon D. J., Dabrera G., O’Toole Á., Others, Assessing transmissibility of SARS-CoV-2 lineage B. 1.1. 7 in England. Nature, 1–17 (2021). - PubMed
    1. Stefanelli P., Trentini F., Guzzetta G., Marziano V., Mammone A., Poletti P., Grané C. M., Manica M., del Manso M., Andrianou X., Others, Co-circulation of SARS-CoV-2 variants B. 1.1. 7 and P. 1. medRxiv (2021) (available at https://www.medrxiv.org/content/10.1101/2021.04.06.21254923v1.abstract). - DOI
    1. Vöhringer H. S., Sanderson T., Sinnott M., De Maio N., Nguyen T., Goater R., Schwach F., Harrison I., Hellewell J., Ariani C., Gonçalves S., Jackson D., Johnston I., Jung A. W., Saint C., Sillitoe J., Suciu M., Goldman N., Birney E., Funk S., Volz E., Kwiatkowski D., Chand M., Martincorena I., Barrett J. C., Gerstung M., The Wellcome Sanger Institute Covid-19 Surveillance Team, The COVID-19 Genomics UK (COG-UK) Consortium, Genomic reconstruction of the SARS-CoV-2 epidemic across England from September 2020 to May 2021. bioRxiv (2021),, doi: 10.1101/2021.05.22.21257633. - DOI
    1. Korber B., Fischer W. M., Gnanakaran S., Yoon H., Theiler J., Abfalterer W., Hengartner N., Giorgi E. E., Bhattacharya T., Foley B., Hastie K. M., Parker, Partridge D. G., Evans C. M., Freeman T. M., de Silva T. I., McDanal C., Perez L. G., Tang H., Moon-Walker A., Whelan S. P., LaBranche C. C., Saphire E. O., Montefiori D. C., Angyal A., Brown R. L., Carrilero L., Green L. R., Groves D. C., Johnson K. J., Keeley A. J., Lindsey B. B., Parsons P. J., Raza M., Rowland-Jones S., Smith N., Tucker R. M., Wang D., Wyles M. D., Tracking changes in SARS-CoV-2 Spike: evidence that D614G increases infectivity of the COVID-19 virus. Cell (2020), doi: 10.1016/j.cell.2020.06.043. - DOI - PMC - PubMed

Publication types