Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Sep 22;117(38):23652-23662.
doi: 10.1073/pnas.2008281117. Epub 2020 Aug 31.

A SARS-CoV-2 vaccine candidate would likely match all currently circulating variants

Affiliations

A SARS-CoV-2 vaccine candidate would likely match all currently circulating variants

Bethany Dearlove et al. Proc Natl Acad Sci U S A. .

Abstract

The magnitude of the COVID-19 pandemic underscores the urgency for a safe and effective vaccine. Many vaccine candidates focus on the Spike protein, as it is targeted by neutralizing antibodies and plays a key role in viral entry. Here we investigate the diversity seen in severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) sequences and compare it to the sequence on which most vaccine candidates are based. Using 18,514 sequences, we perform phylogenetic, population genetics, and structural bioinformatics analyses. We find limited diversity across SARS-CoV-2 genomes: Only 11 sites show polymorphisms in >5% of sequences; yet two mutations, including the D614G mutation in Spike, have already become consensus. Because SARS-CoV-2 is being transmitted more rapidly than it evolves, the viral population is becoming more homogeneous, with a median of seven nucleotide substitutions between genomes. There is evidence of purifying selection but little evidence of diversifying selection, with substitution rates comparable across structural versus nonstructural genes. Finally, the Wuhan-Hu-1 reference sequence for the Spike protein, which is the basis for different vaccine candidates, matches optimized vaccine inserts, being identical to an ancestral sequence and one mutation away from the consensus. While the rapid spread of the D614G mutation warrants further study, our results indicate that drift and bottleneck events can explain the minimal diversity found among SARS-CoV-2 sequences. These findings suggest that a single vaccine candidate should be efficacious against currently circulating lineages.

Keywords: SARS-CoV-2; evolution; vaccine.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1.
Fig. 1.
SARS-CoV-2 diversity across 18,514 genomes. (A) Distribution representing the location and date of sample collection. (B) Location and frequency of sites with polymorphisms across the genome. Proportion of sequences that showed polymorphisms compared to the reference sequence, Wuhan-Hu-1 (GISAID: EPI_ISL_402215, GenBank: NC_045512). ORFs are shown in gray for nonstructural proteins and in color for structural proteins (S, purple; E, blue; M, green; N, red). (C) Global phylogeny of 18,514 independent genome sequences. The tree was rooted at the reference sequence, Wuhan-Hu-1, and tips are colored by collection location. The scale indicates the distance corresponding to one substitution. Lineages are labeled following PANGOLIN (22).
Fig. 2.
Fig. 2.
The S mutation D614G quickly became dominant. The mutation D614G was found in 69% of sequences sampled globally as of May 18, 2020, the second most frequent mutation in S was only found in ∼2% of sequences. (A) Number of sequences with D (gray) or G (purple) by continent and sampling date shown cumulatively through the outbreak. (B) Phylogenetic tree reconstructed from all of the ORFs showing the linkage between D614G in S and P4715L in ORF1ab. Tips are colored by continent. The phylogeny suggests that these mutations were linked to a bottleneck event when SARS-CoV-2 viruses were introduced in Europe; this mutation was first seen in Europe in a sequence sampled in Germany at the end of January. There is no evidence that the increasing predominance of this mutation was caused by convergent selection events that would have occurred in multiple individuals. (C) Overall number of sequences with D614 or D614G across continents; the predominance of D614G in Europe is suggestive of a founder event. (D) Distribution of Hamming distances between sequences with D614, G614 or discordant pairs. The median is marked with a dashed line.
Fig. 3.
Fig. 3.
Evolution across the SARS-CoV-2 genome. (A) Bar plot of the average percentage of branch length under diversifying selection (dN/dS > 1) for each site. (B) Bar plot of dN/dS per gene (dN = dS is shown as dashed line). Error bars indicate SD across subsampled alignments. (C) Box plot of nonsynonymous substitutions per lineage per site across structural and nonstructural genes. Values across subsampled alignments for each gene are plotted. (D) Average percentage (over subsampled alignments) of branch lengths evolving under neutral (or negative) selection per site for each structural gene. Median values are shown by dashed lines.
Fig. 4.
Fig. 4.
Limited evidence of adaptation of the viral population. (AC) Bootstrapped global estimates of Nei’s GST and Jost’s D for population differentiation for each structural gene. (A) Estimates of Nei’s GST (closed circles) and Jost’s D (open circles) comparing sequences sampled from the Hubei province to sequences subsequently sampled globally. Estimates of (B) Nei’s GST and (C) Jost’s D comparing sequences sampled before or after a specific date. Lines connect the median estimates across datasets for each gene. (D) Ln-transformed phylogenetic η, indicative of the number of iterative events in the sampled subtree, for subtrees from each internal node (after the root) of a down-sampled SARS-CoV-2 whole-genome phylogeny (dark gray), of a phylogeny simulated under neutral parameters (gold), and of a phylogeny simulated under positive time-dependent rates (b(t) = 0.01e0.4t, green). (E) Box plot of ln-transformed phylogenetic η estimates across all down-sampled SARS-CoV-2 whole-genome phylogenies, phylogenies simulated under neutral parameters, and phylogenies simulated under different positive time dependencies, α. Asterisks indicate significant differences in mean values (Student’s t test, P < 0.05) between the SARS-CoV-2 and positive time-dependent phylogenies at each α.
Fig. 5.
Fig. 5.
Mutations across SARS-CoV-2 S sequences. (A) Structure of SARS-CoV (5 × 58) (shown instead of SARS-CoV-2 for completeness of the Receptor Binding Motif [RBM]). (BD) The three protomers in the closed SARS-CoV-2 S glycoprotein (Protein Data Bank ID code 6VXX) are colored in yellow, cyan, and white. Sites with mutations are shown as spheres. (B) Near-identity of potential vaccine candidates. The MRCA and Wuhan-Hu-1 reference sequences were identical, while the consensus derived from all circulating sequences showed a mutation (D614G). Site 614 is located at the interface between two subunits. (C) Sequence segments that differed between human and pangolin or bat hosts. Amino acid segments 439 to 455 and 482 to 501 impact receptor binding, while the 574 to 690 segment corresponds to the S2 cleavage site. (D) Sites with shared mutations across SARS-CoV-2 circulating sequences. The colors of the spheres correspond to the proportion of SARS-CoV-2 sequences that differed from the Wuhan-Hu-1 sequence (GISAID: EPI_ISL_402125, GenBank: NC_045512). Mutations that were found only in one or two sequences were not represented.

Comment in

  • Low genetic diversity may be an Achilles heel of SARS-CoV-2.
    Rausch JW, Capoferri AA, Katusiime MG, Patro SC, Kearney MF. Rausch JW, et al. Proc Natl Acad Sci U S A. 2020 Oct 6;117(40):24614-24616. doi: 10.1073/pnas.2017726117. Epub 2020 Sep 21. Proc Natl Acad Sci U S A. 2020. PMID: 32958678 Free PMC article. No abstract available.

References

    1. Ou X. et al., Characterization of spike glycoprotein of SARS-CoV-2 on virus entry and its immune cross-reactivity with SARS-CoV. Nat. Commun. 11, 1620 (2020). - PMC - PubMed
    1. Wu F., et al. , Neutralizing antibody responses to SARS-CoV-2 in a COVID-19 recovered patient cohort and their implications. medRxiv:10.1101/2020.03.30.20047365 (20 April 2020).
    1. Premkumar L. et al., The receptor binding domain of the viral spike protein is an immunodominant and highly specific target of antibodies in SARS-CoV-2 patients. Sci. Immunol. 5, eabc8413 (2020). - PMC - PubMed
    1. Lv H. et al., Cross-reactive antibody response between SARS-CoV-2 and SARS-CoV infection. Cell Rep., 10.1016/j.celrep.2020.107725 (2020). - DOI - PMC - PubMed
    1. Wu F. et al., A new coronavirus associated with human respiratory disease in China. Nature 579, 265–269 (2020). - PMC - PubMed

Publication types