Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation

The P323L substitution in the SARS-CoV-2 polymerase (NSP12) confers a selective advantage during infection

Hannah Goldswain et al. Genome Biol. .

Abstract

Background: The mutational landscape of SARS-CoV-2 varies at the dominant viral genome sequence and minor genomic variant population. During the COVID-19 pandemic, an early substitution in the genome was the D614G change in the spike protein, associated with an increase in transmissibility. Genomes with D614G are accompanied by a P323L substitution in the viral polymerase (NSP12). However, P323L is not thought to be under strong selective pressure.

Results: Investigation of P323L/D614G substitutions in the population shows rapid emergence during the containment phase and early surge phase during the first wave. These substitutions emerge from minor genomic variants which become dominant viral genome sequence. This is investigated in vivo and in vitro using SARS-CoV-2 with P323 and D614 in the dominant genome sequence and L323 and G614 in the minor variant population. During infection, there is rapid selection of L323 into the dominant viral genome sequence but not G614. Reverse genetics is used to create two viruses (either P323 or L323) with the same genetic background. L323 shows greater abundance of viral RNA and proteins and a smaller plaque morphology than P323.

Conclusions: These data suggest that P323L is an important contribution in the emergence of variants with transmission advantages. Sequence analysis of viral populations suggests it may be possible to predict the emergence of a new variant based on tracking the frequency of minor variant genomes. The ability to predict an emerging variant of SARS-CoV-2 in the global landscape may aid in the evaluation of medical countermeasures and non-pharmaceutical interventions.

Keywords: COVID-19; Evolution; NSP12; P323L; Polymerase; SARS-CoV-2; Selection; Spike protein.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1.
Fig. 1.
Sequence analysis and amino acid substitution in NSP12 (P323L) and the spike protein (D614G) between the first and third days of sampling in a patient hospitalized with COVID-19. Three different sequencing approaches were used: A an ARTIC-Illumina approach and B a RSLA-Nanopore approach to show the amino acid variation frequencies of NSP12 (at codon position 323) and spike protein (codon position 614). C Sanger sequence analysis of the amplicons used to investigate the change of dominant viral genome sequence around the sites within NSP12 (codon position 323) and spike protein (codon position 614). These samples were gathered under the auspices of ISARIC 4C
Fig. 2.
Fig. 2.
Analysis of the ratio of P323:L323 (light blue) and D614:G614 (blue) in 377 patient samples between January 2020 and May 2020 in the A UK and B worldwide (a ratio of 1.00= L323/G614 dominant and the violin plot indicates the number of samples). SARS-CoV-2 sequence was obtained from nasopharyngeal swabs from 377 hospitalized patients. The width of the violin plot indicates the number of samples/patients with the frequency on the y-axis. The data shows the transition from P323L and D614G over time in the minor variant genomes, such that by April 2020 in the UK, the L323 and G614 substitutions were part of the dominant viral genome sequence and by May 2020, there was no evidence of P323 and D614 at the dominant level. The y-axis (variation frequency) is in the direction of P323 to L323 and D614 to G614, such that a viral population with 100% L323 or G614 would be shown with a variation frequency of 1.00. Likewise, if there is a variation frequency of 0.00, this would mean that there was a viral population with 100% P323 or D614
Fig. 3.
Fig. 3.
Analysis of minor variant genomes in cynomolgus (NW_Cyno; orange) and rhesus (NW_Rhesus; blue) macaques infected with the SARS-CoV-2 Victoria/01/2020 isolate using data from shotgun Illumina RNA sequencing of nasal washes (NW). Data is presented as a global average over the course of the infection from sequencing SARS-CoV-2 from longitudinal samples. Each SARS-CoV-2 open reading frame is indicated above the appropriate panel. The major difference was at position 323 in NSP12
Fig. 4.
Fig. 4.
Analysis of NSP12 position 323 and spike position 614 through ARTIC-Illumina sequencing (A/C) and Illumina total RNAseq (B/D) from nasopharyngeal swabs taken longitudinally from infected cynomolgus (CX-X, n=6) and rhesus (RX-X, n=6) macaques. Data in this figure is from the ARTIC-Illumina approach to specifically amplify SARS-CoV-2 RNA (coverage filtered at 20×) and the Illumina total RNAseq approach without prior amplification (coverage filtered at 5×). The day post infection is shown for the animals. At position 323/614, a P/D is shown as light blue, an L/G as dark blue respectively, and green indicates other substitutions. The left-hand y-axis indicates the proportion of variation at the indicated position. (NHPs C= cynomolgus, R= rhesus macaque, CX-X/RX-X is the identity of the animal, with the experimental group C/RX and the animal number as -X)
Fig. 5.
Fig. 5.
Predicted fits of the exponential growth model for the L323 substitution in 12 NHPs using the data shown in Fig. 4 for ARTIC-Illumina. The red line indicates the model fit estimated with a generalized linear mixed-effects model (GLMM), and black points correspond to frequency of L323 mutation over time. (NHPs C= cynomolgus, R= rhesus macaque, CX-X/RX-X is the identity of the animal)
Fig. 6.
Fig. 6.
Investigating growth of P323 and L323 in cell culture. A Representative images of plaques formed by two viruses created through reverse genetics that have the Wuhan-Hu-1 background (NC_045512) and an engineered D614G substitution in the spike protein, and either P323 or L323 in NSP12 (termed Wuhan/G614/P323 and Wuhan/G614/L323 respectively). B Relative RNA levels of genomic or N subgenomic RNA with Wuhan/G614/P323 or Wuhan/G614/L323 from RT-qPCR on ACE2-A549 cells infected with either virus at 24h. Error bars show standard deviation. Unpaired t-tests without Welch’s correction, p=0.0181 and p=0.0393 respectively, for n=3 biological replicates. C Western blot analysis of the abundance of nucleoprotein produced in either mock infected, or cells infected with Wuhan/G614/P323 or Wuhan/G614/L323. This is an exemplar western blot for an experiment that was done in triplicate; GAPDH is shown as a protein loading control. D Mean viral titers (pfu/ml, n=3 biological replicates ± standard deviation) at 24hpi in Vero E6, Vero/hSLAM, and ACE2-A549 cells infected with either Wuhan/G614/P323 or Wuhan/G614/L323. E,F Proportion of amino acid P323/L323 in NSP12 (E) or D614/G614 in the Spike protein (F) in the Victoria/01/2020 isolate serially passaged through cells over 13 sequential passages (coverage filtered at 20×)
Fig. 7.
Fig. 7.
Model for the transmission of variant genomes. This model suggests that genomes encoding amino acids under strong selection pressure (such as P323 in this case) have potential options for growth and transmission of viral populations via either consensus viral genomes with P323 (cyan) and L323 (red) present in minor variant genomes or in equilibrium, or where L323 is dominant in the viral genome sequence and P323 present at a minor variant level. Given the potential strong selection pressure on position 323, the time post infection transmission occurs is crucial in determining which variant becomes dominant viral genome sequence. This figure was created using Biorender.com
Fig. 8.
Fig. 8.
Amino acid mutations at site 323 in NSP12 in samples sequenced using the ARTIC-Nanopore approach (n=101) from July to September 2021 obtained from the Short Read Archive. The bioinformatics tool DiversiTools was used to generate proportions of the counts of amino acids at site 323 and showed that L is dominant in viral sequences from mid-late 2021, with P remaining a small proportion of the population alongside amino acids F, S, and I

References

    1. Worobey M, Pekar J, Larsen BB, Nelson MI, Hill V, Joy JB, Rambaut A, Suchard MA, Wertheim JO, Lemey P. The emergence of SARS-CoV-2 in Europe and North America. Science. 2020;370:564–570. doi: 10.1126/science.abc8169. - DOI - PMC - PubMed
    1. Davidson AD, Williamson MK, Lewis S, Shoemark D, Carroll MW, Heesom KJ, Zambon M, Ellis J, Lewis PA, Hiscox JA, Matthews DA. Characterisation of the transcriptome and proteome of SARS-CoV-2 reveals a cell passage induced in-frame deletion of the furin-like cleavage site from the spike glycoprotein. Genome Med. 2020;12:68. doi: 10.1186/s13073-020-00763-0. - DOI - PMC - PubMed
    1. Young BE, Fong SW, Chan YH, Mak TM, Ang LW, Anderson DE, Lee CY, Amrun SN, Lee B, Goh YS, et al. Effects of a major deletion in the SARS-CoV-2 genome on the severity of infection and the inflammatory response: an observational cohort study. Lancet. 2020;396:603–611. doi: 10.1016/S0140-6736(20)31757-8. - DOI - PMC - PubMed
    1. Hou YJ, Chiba S, Halfmann P, Ehre C, Kuroda M, Dinnon KH, 3rd, et al. SARS-CoV-2 D614G variant exhibits efficient replication ex vivo and transmission in vivo. Science. 2020;370(6523):1464–1468. doi: 10.1126/science.abe8499. - DOI - PMC - PubMed
    1. Yang HC, Chen CH, Wang JH, Liao HC, Yang CT, Chen CW, et al. Analysis of genomic distributions of SARS-CoV-2 reveals a dominant strain type with strong allelic associations. Proc Natl Acad Sci U S A. 2020;17(48):30679–30686. doi: 10.1073/pnas.2007840117. - DOI - PMC - PubMed

Publication types