Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Sep 30;184(20):5189-5200.e7.
doi: 10.1016/j.cell.2021.09.003. Epub 2021 Sep 7.

The emergence and ongoing convergent evolution of the SARS-CoV-2 N501Y lineages

Affiliations

The emergence and ongoing convergent evolution of the SARS-CoV-2 N501Y lineages

Darren P Martin et al. Cell. .

Abstract

The independent emergence late in 2020 of the B.1.1.7, B.1.351, and P.1 lineages of SARS-CoV-2 prompted renewed concerns about the evolutionary capacity of this virus to overcome public health interventions and rising population immunity. Here, by examining patterns of synonymous and non-synonymous mutations that have accumulated in SARS-CoV-2 genomes since the pandemic began, we find that the emergence of these three "501Y lineages" coincided with a major global shift in the selective forces acting on various SARS-CoV-2 genes. Following their emergence, the adaptive evolution of 501Y lineage viruses has involved repeated selectively favored convergent mutations at 35 genome sites, mutations we refer to as the 501Y meta-signature. The ongoing convergence of viruses in many other lineages on this meta-signature suggests that it includes multiple mutation combinations capable of promoting the persistence of diverse SARS-CoV-2 lineages in the face of mounting host immune recognition.

Keywords: COVID 19; convergent mutations; directional selection; diversifying selection; evolutionary adaptation; immune evasion; lineage-defining mutations; positive selection; recurrent mutations; transmission advantage.

PubMed Disclaimer

Conflict of interest statement

Declaration of interests J.O.W. has been funded by Gilead Sciences, LLC (completed) and the CDC (ongoing) via grants and contracts to his institution unrelated to this research.

Figures

None
Graphical abstract
Figure 1
Figure 1
SARS-CoV-2 genome map indicating the locations and encoded amino acid changes of what we considered here to be signature mutations of V1, V2, and V3 sequences Genes represented with light-blue blocks encode non-structural proteins and genes in orange encode structural proteins: S encodes the Spike protein, E the envelope protein, M the matrix protein, and N the nucleocapsid protein. Within the S-gene, the receptor binding domain (RBD) is indicated by a darker shade, and the furin cleavage site is indicated by a dotted vertical line.
Figure 2
Figure 2
Signals of positive and negative selection at individual codon sites that were detectable with the FEL method at different times between March 2020 and April 2021 applied to sequences sampled over 90-day intervals The plotted dates show the end of 90-day periods. (A and B) The gene-by-gene/per Kb/per unit tree length density of codons that are detectably evolving under positive/negative selection between March 2020 and February 2021. Whereas genes for which the maximum observed density of positively/negatively selected sites was reached in February 2021 are shown with thicker lines, genes with associated trees that have a total length shorter than 0.5 subs/site for a given time period are not shown. A version of (A) with all genes displayed separately is given in Figure S1. (C) Signals of positive selection detected at 37 V1, V2, and V3 signature mutation sites between March 2020 and April 2021. Also included for reference are sites previously detected to be evolving under positive selection such as S/614, the site of the D614G mutation that is present in all three of the 501Y lineages, S/5, and RdRp/P323L (ORF1b/314). Circles indicate the statistical significance of the FEL test with red indicating positive selection and blue indicating negative selection. The vertical line indicates December 1, 2020, the approximate date when the importance of the V1 and V2 lineages were first noticed. See also Figure S1.
Figure S1
Figure S1
Signals of positive selection at individual codon sites that were detectable with the FEL method at different times between March 2020 and February 2021 applied to sequences sampled over 90-day intervals, related to Figure 2 The plotted date shows the end of the 90-day period. Genes with associated trees that have a total length shorter than 0.5 subs/site for a given time-period are not shown. Note that for the ORF3a and the N gene the interpretation of positive selection signals is complicated by the fact that each of these genes encompasses multiple smaller genes that are expressed in different reading frames. This is because synonymous substitutions in the ORF3a and N reading frames will be non-synonymous substitutions in the reading frames of the smaller genes that they encompass (and vice versa): a situation that is expected to inflate positive selection signals.
Figure 3
Figure 3
Genome sites where signature and convergent mutations occur within the 501Y lineage sequences Sites detectably evolving under positive selection along internal branches (MEME p ≤ 0.05) of the V1, V2, and V3 phylogenies are indicated with red icons. We restricted our analysis to data collected up to April 2021 to focus on interpreting predictive positive selection signals arising from mutations occurring before March 2021, which could then be corroborated by examining mutation frequency data from later months. Labels within the colored blocks indicate amino acid substitutions with block colors indicating model-based predictions of the probable evolutionary viability of the observed amino acid substitutions based on the numbers of times these substitutions have been observed in related coronaviruses that infect other host species. The absence of color indicates unprecedented substitutions, red indicates highly unusual substitutions, and green indicates common substitutions seen at homologous sites in non-SARS-CoV-2 coronaviruses. ORF8 signals have been excluded. See also Figure S2.
Figure S2
Figure S2
Global selection trends at sites in 501Y lineage viruses that are either signature mutations or that, on April 20, 2021, displayed evidence of both lineage-specific positive selection on internal tree branches and mutational convergence between viruses in different 501Y lineages, related to Figure 3 Red dots indicate positive selection and blue dots indicate negative selection in the global SARS-CoV-2 dataset. The sizes of the dots indicate the strength of the positive/negative selection signals. Selection signals indicate those detected when considering only the sequences sampled in the preceding three months.
Figure 4
Figure 4
Selection signals evident in the global data at the subset of sites identified by the positive selection, convergence, and mutation frequency change analyses as likely contributing to the fitness of the 501Y lineage viruses: a subset of sites and their associated amino acid states that we refer to as the 501Y lineage meta-signature The strengths of detected selection signals (with the IFEL method) are indicated by the sizes of the red dots. Selection tests were performed on sequence data collected within the preceding three months (i.e., red spots plotted in April reflect the analysis of sequences sampled between January 1 and April 1). The vertical bar indicates December 1, 2020. The global frequencies of the represented mutations are indicated in gray. These frequencies are strongly biased by, and therefore track in many instances, the rapid rise of V1 viruses in the UK, Europe and North America: the regions of the world responsible for >90% of all SARS-CoV-2 genome sequencing between January and June 2001.
Figure 5
Figure 5
Weekly changes in the counts of sequences displaying multiple convergence mutations at the 19 sites in Spike predicted by our analyses to provide 501Y lineage viruses with selective advantages This 501Y lineage meta-signature includes the following mutations: 18F, 26R/L/S, 69-70Del, 98F, 138HY, 144Del, 215G/H/V/Y, 241-243Del, 417N/T, 484K, 501Y, 655Y, 681L/R/H, 701V, 716I, 1027I, 1118H, 1176F, 1264L. The matches plots (top row) indicate the numbers of sequenced V1, V2, and V3 genomes carrying a given number of matching mutations at sites on this list: archetypical V1, V2, and V3 sequences, respectively, have Spike sequences with six, six, and nine matches. The signature sequence plots (bottom row) indicate the counts of particular V1, V2, and V3 Spike sequence haplotypes with the highest numbers of matches and indicate the subsets of matching mutations in these haplotype sequences. The signature lists included together with these plots indicate the subset of mutations at the 19 convergence list sites that are present in the different Spike haplotype sequences represented in the plots. “.” symbols indicate the absence of a convergence list mutation, “-” symbols indicate the occurrence of convergence list deletion mutations and letters indicate the presence of convergence list amino acid substitutions.

Update of

References

    1. Althaus C.L., Baggio S., Reichmuth M.L., Hodcroft E.B., Riou J., Neher R.A., Jacquerioz F., Spechbach H., Salamun J., Vetter P., et al. A tale of two variants: Spread of SARS-CoV-2 variants Alpha in Geneva, Switzerland, and Beta in South Africa. MedRxiv. 2021 doi: 10.1101/2021.02.23.21252268. - DOI
    1. Annavajhala M.K., Mohri H., Zucker J.E., Sheng Z., Wang P., Gomez-Simmonds A., Ho D.D., Uhlemann A.-C. A Novel SARS-CoV-2 Variant of Concern, B.1.526, Identified in New York. MedRxiv. 2021 doi: 10.1101/2021.02.23.21252259. - DOI
    1. Campbell K.M., Steiner G., Wells D.K., Ribas A., Kalbasi A. Prediction of SARS-CoV-2 epitopes across 9360 HLA class I alleles. BioRxiv. 2020 doi: 10.1101/2020.03.30.016931. - DOI
    1. Cele S., Gazy I., Jackson L., Hwa S.-H., Tegally H., Lustig G., Giandhari J., Pillay S., Wilkinson E., Naidoo Y., et al. Network for Genomic Surveillance in South Africa. COMMIT-KZN Team Escape of SARS-CoV-2 501Y.V2 from neutralization by convalescent plasma. Nature. 2021;593:142–146. - PMC - PubMed
    1. Collier D.A., De Marco A., Ferreira I.A.T.M., Meng B., Datir R.P., Walls A.C., Kemp S.A., Bassi J., Pinto D., Silacci-Fregni C., et al. CITIID-NIHR BioResource COVID-19 Collaboration. COVID-19 Genomics UK (COG-UK) Consortium Sensitivity of SARS-CoV-2 B.1.1.7 to mRNA vaccine-elicited antibodies. Nature. 2021;593:136–141. - PMC - PubMed

Publication types