Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Mar 22;22(2):1267-1278.
doi: 10.1093/bib/bbaa262.

Compositional diversity and evolutionary pattern of coronavirus accessory proteins

Affiliations

Compositional diversity and evolutionary pattern of coronavirus accessory proteins

Jingzhe Shang et al. Brief Bioinform. .

Abstract

Accessory proteins play important roles in the interaction between coronaviruses and their hosts. Accordingly, a comprehensive study of the compositional diversity and evolutionary patterns of accessory proteins is critical to understanding the host adaptation and epidemic variation of coronaviruses. Here, we developed a standardized genome annotation tool for coronavirus (CoroAnnoter) by combining open reading frame prediction, transcription regulatory sequence recognition and homologous alignment. Using CoroAnnoter, we annotated 39 representative coronavirus strains to form a compositional profile for all of the accessary proteins. Large variations were observed in the number of accessory proteins of 1-10 for different coronaviruses, with SARS-CoV-2 and SARS-CoV having the most (9 and 10, respectively). The variation between SARS-CoV and SARS-CoV-2 accessory proteins could be traced back to related coronaviruses in other hosts. The genomic distribution of accessory proteins had significant intra-genus conservation and inter-genus diversity and could be grouped into 1, 4, 2 and 1 types for alpha-, beta-, gamma-, and delta-coronaviruses, respectively. Evolutionary analysis suggested that accessory proteins are more conservative locating before the N-terminal of proteins E and M (E-M), while they are more diverse after these proteins. Furthermore, comparison of virus-host interaction networks of SARS-CoV-2 and SARS-CoV accessory proteins showed that they share multiple antiviral signaling pathways, those involved in the apoptotic process, viral life cycle and response to oxidative stress. In summary, our study provides a tool for coronavirus genome annotation and builds a comprehensive profile for coronavirus accessory proteins covering their composition, classification, evolutionary pattern and host interaction.

Keywords: accessory proteins; compositional diversity; coronavirus; evolution.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Methodology of the CoroAnnoter. (A) Unknown characteristics of the diversity, evolution, origin and function of coronavirus accessory protein analysis. Three modules are included. Data preparation included the selection of representative strains, the preparation of genomic sequences and the construction of BLAST database. Finally, the composition and evolutionary patterns of coronavirus accessory proteins were systematically annotated and analyzed.
Figure 2
Figure 2
The comprehensive annotation of coronavirus accessory proteins. A phylogenetic tree of 39 representative reference coronaviruses was built based on the pp1b region. Four large branches, alpha, beta, gamma and delta coronaviruses, were clearly grouped as the classification from ICTV. Seven types of human-infecting coronaviruses are highlighted with rectangular boxes, while the recently pendemic SARS-COV-2 is indicated by a green star. For each reference strain, the 3'-terminal of genome encodes structural proteins (S, E, M and N) and accessary proteins are shown as a linear structure. Structural proteins are indicates in gray. The accessory proteins are indicates in different colors according to their names. The TRSs that regulate the discontinuous transcription of sub-genomes for coronaviruses are indicates as black points in the genome structure. The TRS sequence and numbers of accessory proteins are both listed.
Figure 3
Figure 3
Distribution patterns of coronavirus accessory proteins on genome structure. (A) Eight compositional types of coronavirus accessory proteins are defined based on their genomic locations and compositions. There is no accessory protein between structural proteins E and M. The E-M proteins were used as the border, the accessory proteins in front are displayed in red and proteins behind E-M are green. (B–E) The distributions of accessory proteins for each strain in the alpha- (B), beta- (C), gamma- (D) and delta- (E) genus are shown. Seven human-infecting coronaviruses are highlighted in red.
Figure 4
Figure 4
Sequence similarities of accessory proteins among four genera of coronaviruses. (A) Sequence similarities of accessory protein sequences between pairs of strains as determined with the Circos software. The E-value for the BLAST alignment is 1 E-5. Ribbon colors correspond to blast scores. Stains of alpha, beta, delta and gamma-coronaviruses are shown in green, pink, blue and purple, respectively. (B) Identification of conserved accessory proteins before and after E-M proteins. Pairwise alignment was used to measure the similarity between accessory protein sequences. Sequence identities greater than 40% are shown in red in the heatmap.
Figure 5
Figure 5
Conserved accessory proteins in beta-coronavirus. (A) Visualization of sequence similarity of accessory protein sequences in beta-coronavirus using the Circos software. The E-value for the blast alignment is 1 E-5. Ribbon colors correspond to BLAST scores. (B–D) Sequence identity of conserved accessory proteins in beta-coronavirus. Pairwise alignment was used to measure the similarity between accessory protein sequences. Sequence identities greater than 40% are shown in red in the heatmap.
Figure 6
Figure 6
Interaction networks between accessory proteins and host for SARS-CoV and SARS-CoV-2. (A) Comparison of virus-host interaction networks of SARS-CoV and SARS-CoV-2 accessory proteins. Four common genes (SMOC1, MARK3, DCTN2 and BAG6) are shown in purple. Functions annotated via the Gene Ontology and IPA software are listed at the bottom. (B) Sub-networks of SARS-CoV and SARS-CoV-2 involved in the apoptotic process. Human proteins were selected from published papers. Interaction relationships were extracted by the IPA software.

References

    1. Huang C, Wang Y, Li X, et al. Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China. Lancet 2020;395:497–506. - PMC - PubMed
    1. Zhu N, Zhang D, Wang W, et al. A novel coronavirus from patients with pneumonia in China, 2019. N Engl J Med 2020;382:727–33. - PMC - PubMed
    1. WHO Statement on the second meeting of the International Health Regulations (2005) Emergency Committee regarding the outbreak of novel coronavirus (2019-nCoV). 2020.
    1. Drosten C, Günther S, Preiser W, et al. Identification of a novel coronavirus in patients with severe acute respiratory syndrome. N Engl J Med 2003;348:1967–76. - PubMed
    1. Ksiazek TG, Erdman D, Goldsmith CS, et al. A novel coronavirus associated with severe acute respiratory syndrome. N Engl J Med 2003;348:1953–66. - PubMed

Publication types