Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2003 May;1(2):155-65.
doi: 10.1016/s1672-0229(03)01019-2.

The R protein of SARS-CoV: analyses of structure and function based on four complete genome sequences of isolates BJ01-BJ04

Affiliations
Comparative Study

The R protein of SARS-CoV: analyses of structure and function based on four complete genome sequences of isolates BJ01-BJ04

Zuyuan Xu et al. Genomics Proteomics Bioinformatics. 2003 May.

Abstract

The R (replicase) protein is the uniquely defined non-structural protein (NSP) responsible for RNA replication, mutation rate or fidelity, regulation of transcription in coronaviruses and many other ssRNA viruses. Based on our complete genome sequences of four isolates (BJ01-BJ04) of SARS-CoV from Beijing, China, we analyzed the structure and predicted functions of the R protein in comparison with 13 other isolates of SARS-CoV and 6 other coronaviruses. The entire ORF (open-reading frame) encodes for two major enzyme activities, RNA-dependent RNA polymerase (RdRp) and proteinase activities. The R polyprotein undergoes a complex proteolytic process to produce 15 function-related peptides. A hydrophobic domain (HOD) and a hydrophilic domain (HID) are newly identified within NSP1. The substitution rate of the R protein is close to the average of the SARS-CoV genome. The functional domains in all NSPs of the R protein give different phylogenetic results that suggest their different mutation rate under selective pressure. Eleven highly conserved regions in RdRp and twelve cleavage sites by 3CLP (chymotrypsin-like protein) have been identified as potential drug targets. Findings suggest that it is possible to obtain information about the phylogeny of SARS-CoV, as well as potential tools for drug design, genotyping and diagnostics of SARS.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Diagrams of the GC content (A), hydrophobicity (B) and charge distribution (C) of the R protein. The X-axes stand respectively for GC-content (A), hydrophobicity score (B) and charge score (C), generated by corresponding algorithms (see materials and methods for details). The corresponding Y-axes stand for nt position (A) or amino acid (a.a.) position (B, C) of the R protein. The window sizes are 300 nt (A) and 100 a.a. (B, C).
Fig. 2
Fig. 2
Diagram of the putative function-related regions in the R protein (ORF1ab and ORP1a). Based on sequence analysis, we speculated and defined 15 regions that potentially function in SARS-CoV. 3CLP and PLP function as proteinase in the R protein. The blank triangles indicate the cleavage sites by PLP, and the solid triangles by 3CLP. The narrow black rectangles indicate the functional regions. The bottom ruler stand for the position of the amino acid of the R protein with a unit of kilo-amino acids (ka.a.). LP: leader protein. p65-LP: MHV p65 like protein. (-1) RF: (-1) ribosome frameshift. BHID: hydrophilic domain identified by BGI. BHOD: hydrophobic domain identified by BGI.
Fig. 3
Fig. 3
Similarity analysis of the region for RdRp (NSP9) in the R protein. The X-axis stands for the similarity score of the multiple-alignment, and the Y-axis stands for the amino acid position of the consensus sequence of RdRps. We used the sequences of RdRp from 7 coronaviruses, including SARS-CoV, to do multiple-alignment. The other 6 coronaviruses are avian infectious bronchitis virus (AIBV), bovine coronavirus (BCoV), human coronavirus 229E (HCoV-229E), murine hepatitis virus (MHV), porcine epidemic diarrhea virus (PEDV), and transmissible gastroenteritis virus (TGEV). Based on the graphic show (generated by EMBOSS-ploycon, window size = 10, see materials and methods for details) of the multiple-alignment, we highlighted 11 high-conserved subregions of the R protein, which might contribute to some important functions and can be potentially used as the target for anti-SARS drug design.
Fig. 4
Fig. 4
Similarity analysis and conserved subregion in 3CLP (NSP2). Based on the multiple-alignment of 3CLP from seven coronaviruses that are similar to the samples used in Fig. 3, the diagram A (generated by EMBOSS-polycon, window size = 10, see materials and methods for details) shows the most similar subregions a and b of 3CLP. Based on the global pair-wise alignment of the seven 3CLPs, polydot diagram B (generated by EMBOSS-polydot, see materials and methods for details) shows the conserved regions of every pair.
Fig. 5
Fig. 5
Multiple alignment of the region for 3CLP (NSP2) among seven coronaviruses. 3CLP is the main proteinase of coronaviruses, with the catalytic sites His41 and Cys147. The black triangles indicate the putative catalytic sites of 3CLP. The numbers above the sequences indicate the amino acid position of 3CLP. The amino acid was highlighted in different colors.
Fig. 6
Fig. 6
Dotplot diagram (generated by EMBOSS-doplot, window size=10, threshold=23, see materials and methods for details) of the similarity in the R protein between SARS-CoV and other 6 coronaviruses. The X- and Y-axes stand for the amino acid position of corresponding R protein. (A) SARS vs MHV; (B) SARS vs BcoV; (C) SARS vs AIBV; (D) SARS vs HCoV229E; (E) SARS vs PEDV; (F) SARS vs TGEV. It is suggested that MHV and BCoV are more homologous to the SARS-CoV.
Fig. 7
Fig. 7
Pair-wise alignment based on amino acid sequences of the R protein among SARS-CoV and the other 6 coronaviruses. The alignment was performed by EMBOSS-stretcher (see materials and methods for details), in which Myers and Miller algorithm was used instead of the standard sequence global alignment, Needleman and Wunsch algorithm, only to save time and disk memory. The bold number and the normal number indicate the identity and the similarity score, respectively.
Fig. 8
Fig. 8
Proposed phylogenetic trees based on amino acid sequences of the R protein (A), and that of NSP1 (B), PLP (C), 3CLP (D), RdRp (E), and NSP10 (HEL) (F). All the bootstrap trees are generated by ClustlW (see materials and methods for details). The numerical value near the node of branches is the trial for bootstrap.

References

    1. Cavanngh D., Brown T.D.K., editors. Coronaviruses and their diseases. Plenum Press; New York, USA: 1997. pp. 327–356.
    1. De Vries, A.F., et al. The genome organization of the Nidovirales: similarities and differences between Arteri-, Toro-, and Coronaviurses. Semin. Virol. 8: 33–47. - PMC - PubMed
    1. Ziebuhr J. Virus-encoded proteinases and proteolytic processing in the Nidovirales. J. Gen. Virol. 2000;81:853–879. - PubMed
    1. Qin E.D. A complete sequence and comparative analysis of a SARS-associated virus (Isolate BJ01) Chin. Sci. Bull. 2003;48:941–948. - PMC - PubMed
    1. Brierley I. Ribosomal frameshifting on viral RNAs. J. Gen. Virol. 1995;76:1885–1892. - PubMed

Publication types

Substances