Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 May;90(5):1054-1080.
doi: 10.1002/prot.26250. Epub 2021 Oct 9.

Evolution of the SARS-CoV-2 proteome in three dimensions (3D) during the first 6 months of the COVID-19 pandemic

Joseph H Lubin  1   2 Christine Zardecki  1   3 Elliott M Dolan  1   2 Changpeng Lu  1 Zhuofan Shen  1   2 Shuchismita Dutta  1   3   4 John D Westbrook  1   3   4 Brian P Hudson  1   3 David S Goodsell  1   3   4   5 Jonathan K Williams  2 Maria Voigt  1   3 Vidur Sarma  1 Lingjun Xie  1   2 Thejasvi Venkatachalam  1 Steven Arnold  1 Luz Helena Alfaro Alvarado  6 Kevin Catalfano  7 Aaliyah Khan  8 Erika McCarthy  9 Sophia Staggers  10 Brea Tinsley  11 Alan Trudeau  12 Jitendra Singh  13 Lindsey Whitmore  14 Helen Zheng  15 Matthew Benedek  16 Jenna Currier  17 Mark Dresel  2 Ashish Duvvuru  17 Britney Dyszel  18 Emily Fingar  19 Elizabeth M Hennen  20 Michael Kirsch  19 Ali A Khan  19 Charlotte Labrie-Cleary  19 Stephanie Laporte  21 Evan Lenkeit  2 Kailey Martin  18 Marilyn Orellana  17 Melanie Ortiz-Alvarez de la Campa  22 Isaac Paredes  23 Baleigh Wheeler  24 Allison Rupert  24 Andrew Sam  2 Katherine See  25 Santiago Soto Zapata  19 Paul A Craig  25 Bonnie L Hall  24 Jennifer Jiang  1 Julia R Koeppe  19 Stephen A Mills  16 Michael J Pikaart  17 Rebecca Roberts  18 Yana Bromberg  26 J Steen Hoyer  27 Siobain Duffy  27 Jay Tischfield  28 Francesc X Ruiz  29 Eddy Arnold  2   29 Jean Baum  2 Jesse Sandberg  30 Grace Brannigan  30   31 Sagar D Khare  1   2   4 Stephen K Burley  1   2   3   4   32
Affiliations

Evolution of the SARS-CoV-2 proteome in three dimensions (3D) during the first 6 months of the COVID-19 pandemic

Joseph H Lubin et al. Proteins. 2022 May.

Abstract

Understanding the molecular evolution of the SARS-CoV-2 virus as it continues to spread in communities around the globe is important for mitigation and future pandemic preparedness. Three-dimensional structures of SARS-CoV-2 proteins and those of other coronavirusess archived in the Protein Data Bank were used to analyze viral proteome evolution during the first 6 months of the COVID-19 pandemic. Analyses of spatial locations, chemical properties, and structural and energetic impacts of the observed amino acid changes in >48 000 viral isolates revealed how each one of 29 viral proteins have undergone amino acid changes. Catalytic residues in active sites and binding residues in protein-protein interfaces showed modest, but significant, numbers of substitutions, highlighting the mutational robustness of the viral proteome. Energetics calculations showed that the impact of substitutions on the thermodynamic stability of the proteome follows a universal bi-Gaussian distribution. Detailed results are presented for potential drug discovery targets and the four structural proteins that comprise the virion, highlighting substitutions with the potential to impact protein structure, enzyme activity, and protein-protein and protein-nucleic acid interfaces. Characterizing the evolution of the virus in three dimensions provides testable insights into viral protein function and should aid in structure-based drug discovery efforts as well as the prospective identification of amino acid substitutions with potential for drug resistance.

Keywords: COVID-19; SARS-CoV-2; coronavirus; databases; evolution; molecular; pandemics; protein; viral proteins.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing financial interests.

Figures

FIGURE 1
FIGURE 1
(A) Ribbon representation of the experimental structure of SARS‐CoV‐2 nsp5 (PDB ID 6LU7 8 ), with color coding magenta (α‐helices), cyan (β‐sheets), and gold (loops) overlaid with SARS‐CoV‐1 nsp5 (PDB ID 1Q2W 6 ), colored in green. Substrate analog inhibitor present in PDB ID 6LU7 is shown as an atomic stick figure with atom color coding white (carbon), red (oxygen), and blue (nitrogen). (B) The active site of both proteases, with the catalytic dyad (H41 and C145 in 6LU7) shown, with 6LU7 in gold and 1Q2W in gray
FIGURE 2
FIGURE 2
Architecture of the SARS‐CoV‐2 genome and proteome, including nsps derived from polyproteins or pp1a and pp1ab (shades of blue), virion structural proteins (pink/purple), and open reading frame proteins (Orfs, shades of green). Polyprotein cleavage sites are indicated by inverted triangles for Papain‐like Proteinase (PLPro, black) and the Main Protease (nsp5, blue). The double‐stranded RNA substrate‐product complex of the RNA‐dependent RNA polymerase (shown as the nsp7‐nsp82‐nsp12 heterotetramer and separately with only nsp12) is color coded (yellow: product strand, red: template strand). Transmembrane portions of the Spike S‐protein are shown in cartoon form (pink). The source of the structural models used for analyses for all study proteins are indicated (experimentally‐determined, computational homology model, or de novo predicted model)
FIGURE 3
FIGURE 3
Observed counts for USV substitutions of Reference Sequence Residue (i.e., original protein reference sequence amino acid) changing to Substituted Residue for all 19 study proteins with experimentally‐determined structures. (The uncertainty inherent to computationally‐predicted structural models results in greater uncertainty in layer identification for those, thus only models based on experimentally‐determined structures are included.) Red boxes enclose conservative substitutions for hydrophobic, uncharged polar, positively charged, and negatively charged amino acids, respectively, in order from upper left to lower right. Cysteine, glycine, and proline are excluded from these groupings. Substitutions which occurred in 500 or more USVs are also shown with a number indicating the count
FIGURE 4
FIGURE 4
Normalized frequency histogram for ΔΔG App calculated for all USVs aggregated across all 19 study proteins with experimentally‐determined structures. (The uncertainty inherent to computationally‐predicted structural models results in significant uncertainty in calculating atom‐level energetics for those models, thus only models based on experimentally‐determined structures were included.) Left: Overlay with fitted bi‐Gaussian curve (solid red line) with fitted individual Gaussian curves (dashed red lines). The means for the individual Gaussian distributions were +1.8 REU (standard deviation or SD: 8.5) and +8.4 REU (SD: 44.2) (R 2 = 0.92). Right: Overlay of the same normalized frequency histogram with fitted single Gaussian curves fitted to subsets of USVs with Surface (green; mean value: +1.9 REU, SD: 19.2; R 2 = 0.75), Boundary (yellow; mean value: +4.5 REU, SD: 35.0; R 2 = 0.74), or Core (blue; mean value: +5.4 REU, SD: 59.9; R 2 = 0.42) substitutions. USVs with multiple substitutions were included in single Gaussian fitting when all substitutions mapped to the same region of the study protein
FIGURE 5
FIGURE 5
(A) Space‐filling representation of the experimental structure of the nsp7/nsp82/nsp12 heterotetramer bound to double‐stranded RNA (PDB ID 6YYT 21 ) viewed into the enzyme active site on the anterior surface of nsp12. (B) Identical view of PDB ID 6YYT with nsp7 and nsp8 removed to reveal interactions of nsp12 with RNA. Protein color coding: nsp12‐light blue; nsp8‐dark blue; nsp7‐blue/gray; RNA color coding: template strand‐shades of red; product strand‐shades of yellow. (C) Ribbon/atomic stick figure representation of the active site of nsp12 (PDB ID 7BV2; mostly gray) occupied by the RNA template:product duplex (backbone shown as tubes, bases shown as sticks, colored in shades of orange) with remdesivir (shown as an atomic stick figure following enzymatic incorporation into the RNA product strand; atom color coding: C‐green, N‐blue, C‐red, S‐yellow). The active site Motif A is colored coded magenta (atom color coding for invariant residues: C‐magenta, N‐blue, O‐dark red) and purple (atom color coding for substituted residues: C‐purple, N‐blue, O‐dark red, S‐yellow). Residues making direct or water mediated contacts with remdesivir are colored light red (atom color coding: C‐light red, N‐blue, O‐dark red, S‐yellow)
FIGURE 6
FIGURE 6
(A) Space‐filling representation of the experimental structure of the PLPro monomer (blue) bound to a covalent inhibitor (Vir250; red/pink) (PDB ID 6WUU 28 ). (B) Ribbon/atomic stick figure representation of the PLPro‐ISG15 interface (PDB ID 6YVA 29 ). Oxygen atoms are shown in red, nitrogens in blue, and sulfurs in yellow. Cartoons and carbons are gray for ISG15, purple for substituted PLPro interfacial residues, and cyan for all other PLPro residues. (C) Ribbon/atomic stick figure representation of PLPro active site (color coding as for (B)) occupied by a non‐covalent inhibitor (GRL0617) shown as an atomic stick figure (atom color coding: C‐green, N‐blue, O‐red, H‐bonds‐dotted yellow lines; PDB ID 7JN2 30 )
FIGURE 7
FIGURE 7
(A) Space‐filling representation of the experimental structure of the nsp5 homodimer covalently bound to a substrate analogue inhibitor (PDB ID 6LU7 8 ). Color Coding: nsp5 monomers‐light and dark blue; substrate analogue PRD_002214 (https://www.rcsb.org/ligand/PRD_002214)‐red. (B) Ribbon/atomic stick figure representation of the active site of nsp5 (gray) occupied by PRD_002214 covalently bound to C145 (atom color coding: C‐green, N‐blue, O‐red). Catalytic residues H41 and C145 denoted with red ribbon and atomic stick figure sidechains (atom color coding: C‐light red, N‐blue, S‐yellow). Substituted active site residues denoted with purple ribbon and atomic stick figures (atom color coding: C‐purple, N‐blue, O‐red, S‐yellow)
FIGURE 8
FIGURE 8
(A) Ribbon representation of the computed structural model of nsp13 (green; based on PDB ID 6JYT 36 ). The RNA helicase active site is located in the upper half of the protein. (B) Ribbon representation of the experimental structure of the nsp132‐nsp7/nsp82/nsp12 heterohexamer (PDB ID 6XEZ 35 ), viewed to show the RNA double helix, and (C) viewed looking down the RNA helix axis, showing the two helicase active sites presented to the RNA. (color coding for B and C: nsp13‐green, otherwise same color coding as Figure 5)
FIGURE 9
FIGURE 9
(A) Ribbon representation of the computed structural model of the nsp10/nsp14 heterodimer bound to GpppA and S‐adenosyl homocysteine (based on PDB ID 5C8S 40 ). (B) Rotated 90° about the vertical. Color coding: nsp14‐light blue (α‐helices) and purple (β‐sheets and loops); nsp10‐dark blue (α‐helices) and red (β‐sheets and loops); GpppA‐yellow/orange; Exoribonuclease active site Mg++ cation: green
FIGURE 10
FIGURE 10
Ribbon and stick figure representation of the experimental structure of the nsp10 (dark blue)/nsp16 (light blue) heterodimer bound to N7‐methyl‐GpppA and SAM (PDB ID 6WVN 42 ). Color coding: β‐sheets—purple; loops—green; nsp16 α‐helices—light blue; nsp10 α‐helices—dark blue; N7‐methyl‐GpppA—yellow; SAM—red. Left: full complex. Right: active site, showing D75Y, with the WT residue and both ligands in gray, and the substituted residue in cyan
FIGURE 11
FIGURE 11
(A) Space‐filling representation of the experimental structure of the S‐protein homotrimer with one RBD protruding upwards (PDB ID 6VSB 48 ); color coding: RBD up monomer‐dark pink, RBD down monomers purple, N‐linked carbohydrates‐light pink). Membrane spanning portions are depicted in cartoon form. (B) Ribbon/atomic stick figure representation of the RBD interacting with ACE2 (PDB ID 6LZG 53 ). RBD ribbon color: cyan or purple (substituted residues), atom color coding: C‐cyan or purple, N‐blue, O‐red). ACE2 ribbon color: gray; atom color coding: C‐gray, N‐blue, O‐red. (C) Ribbon/atomic stick figure representation of the D614 reference sequence structure (PDB ID 6VSB; D614 ribbon color: cyan; atom color coding: C‐cyan, N‐blue, O‐red) overlayed on the D614G substitution structure (PDB ID 6XS6; D614G ribbon color‐gray; atom color coding: C‐gray, N‐blue, O‐red). H‐bonds denoted with dotted yellow lines
FIGURE 12
FIGURE 12
Ribbon representation of the experimental structures of N‐protein domains (PDB IDs 6VYO and 6YUN 67 ). [N.B. The relative orientations of the N‐terminal (upper: residues 49–173) and C‐terminal (lower: residues 248–364) domains was chosen arbitrarily. No structural information is currently available for residues 1–48, 174–247, and 365–422]
FIGURE 13
FIGURE 13
(A) Space‐filling representation of the computed structural model of the E‐protein with individual protomers shown with shades of pink and purple. (B) Ribbon representation with each protomer shown using a different color viewed parallel to the membrane (left, membrane shown, N‐ and C‐termini labeled) and down the five‐fold axis from the virion surface (right). (C) Pore‐lining substitutions L37R and L37H compared to L37 in the reference sequence (residue 37 is shown in a color‐coded space‐filling representation; C‐gray; O‐red; N‐blue)
FIGURE 14
FIGURE 14
(A) Space‐filling representation of the computed structural model of the M‐protein protomer. The glycosylated N‐terminus is located at the apex of the structure. (B) Ribbon/atomic stick figure representation (color coding: ectodomain‐blue, transmembrane α‐helices‐red, endodomain‐green). N‐ and C‐termini are labeled, together with residues N5, L124, T175, and R186 (shown in ball and stick representation; atom color coding: C‐green, O‐red, N‐blue)

Update of

  • Evolution of the SARS-CoV-2 proteome in three dimensions (3D) during the first six months of the COVID-19 pandemic.
    Lubin JH, Zardecki C, Dolan EM, Lu C, Shen Z, Dutta S, Westbrook JD, Hudson BP, Goodsell DS, Williams JK, Voigt M, Sarma V, Xie L, Venkatachalam T, Arnold S, Alvarado LHA, Catalfano K, Khan A, McCarthy E, Staggers S, Tinsley B, Trudeau A, Singh J, Whitmore L, Zheng H, Benedek M, Currier J, Dresel M, Duvvuru A, Dyszel B, Fingar E, Hennen EM, Kirsch M, Khan AA, Labrie-Cleary C, Laporte S, Lenkeit E, Martin K, Orellana M, de la Campa MO, Paredes I, Wheeler B, Rupert A, Sam A, See K, Zapata SS, Craig PA, Hall BL, Jiang J, Koeppe JR, Mills SA, Pikaart MJ, Roberts R, Bromberg Y, Hoyer JS, Duffy S, Tischfield J, Ruiz FX, Arnold E, Baum J, Sandberg J, Brannigan G, Khare SD, Burley SK. Lubin JH, et al. bioRxiv [Preprint]. 2020 Dec 7:2020.12.01.406637. doi: 10.1101/2020.12.01.406637. bioRxiv. 2020. Update in: Proteins. 2022 May;90(5):1054-1080. doi: 10.1002/prot.26250. PMID: 33299989 Free PMC article. Updated. Preprint.

References

    1. Chen Y, Liu Q, Guo D. Emerging coronaviruses: genome structure, replication, and pathogenesis. J Med Virol. 2020;92(4):418‐423. - PMC - PubMed
    1. Denison MR, Graham RL, Donaldson EF, Eckerle LD, Baric RS. Coronaviruses. RNA Biol. 2011;8(2):270‐279. - PMC - PubMed
    1. Wang R, Hozumi Y, Yin C, Wei GW. Decoding SARS‐CoV‐2 transmission and evolution and ramifications for COVID‐19 diagnosis, vaccine, and medicine. J Chem Inf Model. 2020;60:5853‐5865. - PMC - PubMed
    1. Hadfield J, Megill C, Bell SM, et al. Nextstrain: real‐time tracking of pathogen evolution. Bioinformatics. 2018;34(23):4121‐4123. - PMC - PubMed
    1. Korber B, Fischer WM, Gnanakaran S, et al. Tracking changes in SARS‐CoV‐2 spike: evidence that D614G increases infectivity of the COVID‐19 virus. Cell. 2020;182(4):812‐827. e819. - PMC - PubMed

Publication types