Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Oct 23:16:1176934320965149.
doi: 10.1177/1176934320965149. eCollection 2020.

New Pathways of Mutational Change in SARS-CoV-2 Proteomes Involve Regions of Intrinsic Disorder Important for Virus Replication and Release

Affiliations

New Pathways of Mutational Change in SARS-CoV-2 Proteomes Involve Regions of Intrinsic Disorder Important for Virus Replication and Release

Tre Tomaszewski et al. Evol Bioinform Online. .

Abstract

The massive worldwide spread of the SARS-CoV-2 virus is fueling the COVID-19 pandemic. Since the first whole-genome sequence was published in January 2020, a growing database of tens of thousands of viral genomes has been constructed. This offers opportunities to study pathways of molecular change in the expanding viral population that can help identify molecular culprits of virulence and virus spread. Here we investigate the genomic accumulation of mutations at various time points of the early pandemic to identify changes in mutationally highly active genomic regions that are occurring worldwide. We used the Wuhan NC_045512.2 sequence as a reference and sampled 15 342 indexed sequences from GISAID, translating them into proteins and grouping them by month of deposition. The per-position amino acid frequencies and Shannon entropies of the coding sequences were calculated for each month, and a map of intrinsic disorder regions and binding sites was generated. The analysis revealed dominant variants, most of which were located in loop regions and on the surface of the proteins. Mutation entropy decreased between March and April of 2020 after steady increases at several sites, including the D614G mutation site of the spike (S) protein that was previously found associated with higher case fatality rates and at sites of the NSP12 polymerase and the NSP13 helicase proteins. Notable expanding mutations include R203K and G204R of the nucleocapsid (N) protein inter-domain linker region and G251V of the viroporin encoded by ORF3a between March and April. The regions spanning these mutations exhibited significant intrinsic disorder, which was enhanced and decreased by the N-protein and viroporin 3a protein mutations, respectively. These results predict an ongoing mutational shift from the spike and replication complex to other regions, especially to encoded molecules known to represent major β-interferon antagonists. The study provides valuable information for therapeutics and vaccine design, as well as insight into mutation tendencies that could facilitate preventive control.

Keywords: Nucleocapsid protein; SARS-CoV-2; entropy; mutation; spike protein.

PubMed Disclaimer

Conflict of interest statement

Declaration of conflicting interests:The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Figures

Figure 1.
Figure 1.
General data workflow of the analysis of SARS-CoV-2 genomes, including a breakdown of details that occur during each step. Main steps are indicated in blue, while details per step are indicated in gray.
Figure 2.
Figure 2.
Analysis of mutational entropy at nucleotide (a) and amino acid (b) levels defines an evolving SARS-CoV-2 proteome of 13 proteins with significant mutational change (c). Amino acid locations are only labeled for sites with mutational entropic levels above 0.1 bits in the month of March. Molecules that exhibit significant entropic levels have their atomic 3-dimensional models unshaded in panel C.
Figure 3.
Figure 3.
Pathways of mutational change involve mutational entropy reversals and expansions. Entropy reversals occur when entropy increases and then decreases in the timeline of pandemic. Entropy expansions occur when there is only a pattern of increase, which signals continued diversification of amino acid sequences.
Figure 4.
Figure 4.
Distribution of the 27 entropically-significant mutations throughout regions of the world along the initial timeline of the SARS-CoV-2 pandemic. Regions included South America (SA), Oceania (O), North America (NA), Europe (E), Asia (A), and Africa (AF). The proportion of amino acid variants were plotted for each month. Blue, orange, and green bars depict initial variants for non-structural, structural, and accessory proteins, respectively.
Figure 5.
Figure 5.
Principal component analysis of SARS-CoV-2 genomes with original and variant sites in the S- and N-proteins.
Figure 6.
Figure 6.
Major SARS-CoV-2 protein molecules experiencing mutational entropic reversals. (A) The coronavirus spike is a trimer of S-protein protomers, each harboring an N-terminal S1 subunit sequence with an N-terminal domain (NTD) and a receptor-binding domain (RBD) and a C-terminal S2 subunit holding a ‘fusion’ region with fusion peptide (FD) and internal fusion peptide sequences, 2 heptad repeat (HR) sequences, and a transmembrane (TM) domain. The subunits are processed by host proteases upon viral entry. The SARS-CoV-2 atomic model of a dimer (PDB entry 6VXX) shows the D614G mutation of the S1 domain eliminates a hydrogen bonding interaction with site 859 of the S2 domain of another protomer (colored in orange in the inset). An RMSD versus Z-score plot describes the DALI structural neighborhood of 6VXX, which contains 2310 structures. Alignments of multiple random samples of 10 structures along a transect from 6VXX to the main cloud with low Z-scores of structural similarity (colored red in the plots) consistently show 614 is part of a loop that falls in molecular regions that are poorly conserved at both sequence (Seq) and structural (Str) levels. Blue hues indicate larger conservation than green-to-red hues in the protomer cartoon models of the alignment example. (B) The NSP12 is the main RNA dependent RNA polymerase of the virus. It is encoded by ORF1b and is responsible for the synthesis of viral RNA. Examining the SARS-CoV-2 structure in complex with NSP7 and 8 cofactors (PDB entry 6M71) revealed that the P323L mutation sits in an ‘interface’ region (spanning residues 250-365) between the N-terminal nidovirus-unique NiRAN domain with nucleotidyltransferase activity and the C-terminal polymerase domain that harbors the fingers, palm and thumb subdomains. The mutation is in a helical region at the surface of a pocket formed by the NiRAN, interface and fingers structures (inset). DALI structural neighborhood analysis (1157 structures) confirmed the site is in a region that is poorly conserved at sequence and structure levels, but borderline with the highly conserved regions that harbor polymerase activity. (C) The NSP13 is the helicase of the viral replication complex. NSP13 has an N-terminal Zn binding domain (ZBD) followed by a stalk domain and 3 Rec-A domain structures 1A, 2A and 2B, which form the triangular base of a pyramid. L504P and C541Y are in loop regions located on the surface of the middle of the 2B domain. DALI structural neighborhood analysis of PDB entry 6JYT (1356 structures) showed C541Y is in regions of the molecule that are structurally conserved, while the L504P region was variable at both sequence and structure levels.
Figure 7.
Figure 7.
The mutational diversification of SARS-CoV-2 viroporin encoded by ORF3a. (a) The structure of the protein 3a molecule (PDB entry 6XDC) has 2 domains, a N-terminal transmembrane domain (TD) and a C-terminal cytosolic domain (CM). Mutation Q57H is located in the first of the 3 transmembrane helices at the major hydrophilic constriction of the pore important for channel activity. Mutation G197V forms part of a loop at the surface of the CD. Terminal amino acids 1-38 and 239-275 and a 175-180 in CD could not be modeled because they were weakly resolved. They hold mutations 13 and 251. (b) View from the lumen side of the channel pore (P) in ribbon and atom stick representation. Note that the pore is only 1 Å wide. (c) A DALI structural neighborhood analysis (10 088 structural neighbors) returned significant hits to small fragments (Z ⩽ 9.2; RMSD ⩾ 1.3) that formed a single cluster in the RMSD versus Z-score plot. Structural alignment of the 92 hits with Z ⩾ 7 (red dots) revealed that all hits matched the TD structures and were well conserved at structure (Str) but less at sequence (Seq) levels. The best structural match to the TD was the Orai protein channel (PDB entry 6BBG) responsible for Ca2+ influx pathways in metazoan cells and involved in immune responses and cancer. (d) The mapping of intrinsic disorder (UIPred2, red line) and gain-loss of binding energy (Anchor2, blue line) along the sequence confirmed the significant intrinsic disorder (scores ⩾ 0.5) of the C-terminal linker. A comparison of the different mutants and reference viral strain with a delta score revealed that mutations G196V and G251V decreased disorder.
Figure 8.
Figure 8.
Pathways of mutational diversification of SARS-CoV-2 involve intrinsic disordered regions of the nucleocapsid (N) protein. (a) The N-protein has 2 major RNA-binding domains, an N-terminal domain (NTD) and a C-terminal domain (CTD), both connected to a central linker and flanked by terminal sequences, all of which have been reported to be intrinsically disordered regions (IDRs). Mutations were traced onto a SARS-CoV-2 N-protein structure modeled with I-Tasser. They occurred in position 13 of the N-terminal IDR and positions 193, 197, 203 and 204 of the linker IDR, all of them in loop regions of the molecule. Mutations 203 and 204 were the only sites that were buried in the molecule. (b) A DALI structural neighborhood analysis against the modeled structure (88 structural neighbors, including many from SARS-CoV-2) showed 2 clusters in the RMSD versus Z-score plot, one reflecting structural match to the NTD domain and the other to the CTD domain. Structural alignment plots of the 88 structures supported the veracity of the modeled RNA-binding domains and revealed that the NTD is more conserved at sequence (Seq) and structure (Str) levels. (c) The mapping of intrinsic disorder (UIPred2, red line) and gain-loss of binding energy (Anchor2, blue line) along the sequence confirmed the significant intrinsic disorder and binding (scores ⩾ 0.5) of linker and terminal regions. A comparison of the R203K mutant and reference viral strain with a delta score revealed that the mutation increased disorder. A similar outcome was obtained with the G204R mutant.

References

    1. JHU CSSE. Coronavirus COVID-19 (2019-nCoV) Dashboard. https://gisanddata.maps.arcgis.com/apps/opsdashboard/index.html#/bda7594.... Published May 2020. Accessed May 14, 2020.
    1. WHO. WHO Timeline - COVID-19. https://www.who.int/news-room/detail/27-04-2020-who-timeline—covid-19. Published April 27, 2020. Accessed May 14, 2020.
    1. Corman VM, Muth D, Niemeyer D, Drosten C. Hosts and sources of endemic human coronaviruses. Adv Virus Res. 2018;100:163-188. doi:10.1016/bs.aivir.2018.01.001. - DOI - PMC - PubMed
    1. Cavanagh D. Coronaviruses in poultry and other birds. Avian Pathol J WVPA. 2005;34:439-448. doi:10.1080/03079450500367682. - DOI - PubMed
    1. Coronaviridae Study Group of the International Committee on Taxonomy of Viruses. The species Severe acute respiratory syndrome-related coronavirus: classifying 2019-nCoV and naming it SARS-CoV-2. Nat Microbiol. 2020;5:536-544. doi:10.1038/s41564-020-0695-z. - DOI - PMC - PubMed

LinkOut - more resources