Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Feb 25;13(3):423.
doi: 10.3390/genes13030423.

Conserved Molecular Signatures in the Spike, Nucleocapsid, and Polymerase Proteins Specific for the Genus Betacoronavirus and Its Different Subgenera

Affiliations

Conserved Molecular Signatures in the Spike, Nucleocapsid, and Polymerase Proteins Specific for the Genus Betacoronavirus and Its Different Subgenera

Radhey S Gupta et al. Genes (Basel). .

Abstract

The genus Betacoronavirus, consisting of four main subgenera (Embecovirus, Merbecovirus, Nobecovirus, and Sarbecovirus), encompasses all clinically significant coronaviruses (CoVs), including SARS, MERS, and the SARS-CoV-2 virus responsible for current COVID-19 pandemic. Very few molecular characteristics are known that are specific for the genus Betacoronavirus or its different subgenera. In this study, our analyses of the sequences of four essential proteins of CoVs, viz., spike, nucleocapsid, envelope, and RNA-dependent RNA polymerase (RdRp), identified ten novel molecular signatures consisting of conserved signature indels (CSIs) in these proteins which are specific for the genus Betacoronavirus or its subgenera. Of these CSIs, two 14-aa-conserved deletions found within the heptad repeat motifs 1 and 2 of the spike protein are specific for all betacoronaviruses, except for their shared presence in the highly infectious avian coronavirus. Six additional CSIs present in the nucleocapsid protein and one CSI in the RdRp protein are distinctive characteristics of either the Merbecovirus, Nobecovirus, or Sarbecovirus subgenera. In addition, a 4-aa insert is present in the spike protein, which is uniquely shared by all viruses from the subgenera Merbecovirus, Nobecovirus, and Sarbecovirus, but absent in Embecovirus and all other genera of CoVs. This molecular signature provides evidence that viruses from the three subgenera sharing this CSI are more closely related to each other, and they evolved after the divergence of embecoviruses and other CoVs. As all CSIs specific for different groups of CoVs are flanked by conserved regions, their sequences provide novel means for identifying the above groups of CoVs and for developing novel diagnostic tests. Furthermore, our analyses of the structures of the spike and nucleocapsid proteins show that all identified CSIs are localized in the surface-exposed loops of these protein. It is postulated that these surface loops, through their interactions with other cellular proteins/ligands, play important roles in the biology/pathology of these viruses.

Keywords: Betacoronavirus and its subgenera Sarbecovirus; Merbecovirus and Nobecovirus; conserved signature indels (CSIs); evolution of betacoronaviruses; heptad repeat motifs 1 and 2; molecular markers; nucleocapsid and RNA-dependent RNA polymerase (RdRp) proteins; spike.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
A maximum-likelihood distance tree based on sequence alignment of the RNA-dependent RNA polymerase (RdRp) protein from representative viruses/strains from different genera/subgenera of CoVs. The tree was bootstrapped 100 times and the % bootstraps for different branches are indicated on the nodes. The clades corresponding to different genera and subgenera within the genus Betacoronavirus are labeled.
Figure 2
Figure 2
Partial sequence alignments of two conserved regions from the spike protein showing two different CSIs that are specific for the genus Betacoronavirus. The CSIs present in these sequence alignments are highlighted in blue and they are labeled ❶ and ❷. Both CSIs are commonly shared by all members of the genus Betacoronavirus, but barring one exception, avian coronavirus, they are not found in any other CoV. Dashes (–) in these and all other sequence alignments denote identity with the amino acid shown in the top sequence. The numbers on the top indicate the locations of these sequence regions within the indicated proteins. The accession numbers of different proteins are given in the second column.
Figure 3
Figure 3
Partial sequence alignments of two conserved regions from the RdRp and nucleocapsid proteins showing a number of CSIs that are specific for the subgenera Merbecovirus and Nobecovirus. (A) Partial sequence alignment of RdRp protein showing a CSI consisting of 2-aa insertion (highlighted in blue and labeled ❸) which is specific for the Merbecovirus. (B) Partial sequence alignment of nucleocapsid protein showing two different CSIs, one of which (❹) is specific for the subgenus Merbecovirus and another CSI (labeled ❺), which is only present in different viruses from the subgenus Nobecovirus.
Figure 4
Figure 4
Partial sequence alignments of two conserved regions of the nucleocapsid proteins showing a number of CSIs specific for different subgenera of Betacoronavirus. (A) This sequence region depicts two different CSIs. The CSI labeled ❻ is commonly shared by different viruses from the subgenera Merbecovirus and Sarbecovirus, whereas the CSIs marked ❼ is specific for the viruses from the subgenera Sarbecovirus. (B) This sequence region depicts two CSIs marked ❽ and ❾ which are specific for the subgenus Nobecovirus. Dashes (–) in the alignments indicate identity with the amino acid shown in the top sequence.
Figure 5
Figure 5
Excerpts from the sequence alignment of the spike protein showing a 4-aa CSI that is present only in the Betacoronavirus subgenera Merbecovirus, Nobecovirus, and Sarbecovirus. This CSI (❿) provides evidence that the CoVs from these subgenera are more closely related to each other, and they evolved after the divergence of other CoVs. No change is observed in this region in the Omicron variant.
Figure 6
Figure 6
Mapping the surface locations of eight of the identified CSIs in the spike and nucleocapsid proteins. (A) Cryo-EM-based structure of the post-fusion form of the SARS-CoV spike protein (PDB ID: 6m3w) showing the structural location of CSIs ❶ and ❷. The regions where these CSIs are found are circled. Of these two CSIs, (❶) is present within the HR2 motif, whereas CSI ❷ is found within the HR1 motif in the S2 subunit. (B) The structural location of the 4-aa CSI (❿), which is commonly shared by the Merbecovirus, Nobecovirus, and Sarbecovirus subgenera, using a superimposed structure of the spike protein from SARS-CoV-2 (shown in green) and the PEDV virus (shown in cyan color). (C) Crystal structure of the N-terminal domain of the N-protein (PDB ID: 6LNN) from MERS-CoV in which the structural locations of two CSIs (❹ and ❺) are highlighted. (D) Structure of the RNA-binding domain (RBD) of the N-protein (PDB ID: 7R98) from SARS-CoV-2 depicting the structural locations of three CSIs (❻, ❼, and ❽) in RBD.

Similar articles

Cited by

References

    1. Cui J., Li F., Shi Z.L. Origin and evolution of pathogenic coronaviruses. Nat. Rev. Microbiol. 2019;17:181–192. doi: 10.1038/s41579-018-0118-9. - DOI - PMC - PubMed
    1. Forni D., Cagliani R., Clerici M., Sironi M. Molecular Evolution of Human Coronavirus Genomes. Trends Microbiol. 2017;25:35–48. doi: 10.1016/j.tim.2016.09.001. - DOI - PMC - PubMed
    1. Wong A.C.P., Li X., Lau S.K.P., Woo P.C.Y. Global Epidemiology of Bat Coronaviruses. Viruses. 2019;11:174. doi: 10.3390/v11020174. - DOI - PMC - PubMed
    1. Woo P.C., Huang Y., Lau S.K., Yuen K.Y. Coronavirus genomics and bioinformatics analysis. Viruses. 2010;2:1804–1820. doi: 10.3390/v2081803. - DOI - PMC - PubMed
    1. Lu R., Zhao X., Li J., Niu P., Yang B., Wu H., Wang W., Song H., Huang B., Zhu N., et al. Genomic characterisation and epidemiology of 2019 novel coronavirus: Implications for virus origins and receptor binding. Lancet. 2020;395:565–574. doi: 10.1016/S0140-6736(20)30251-8. - DOI - PMC - PubMed

Publication types