Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2021 Nov;19(11):685-700.
doi: 10.1038/s41579-021-00630-8. Epub 2021 Sep 17.

Structural biology of SARS-CoV-2 and implications for therapeutic development

Affiliations
Review

Structural biology of SARS-CoV-2 and implications for therapeutic development

Haitao Yang et al. Nat Rev Microbiol. 2021 Nov.

Abstract

The COVID-19 pandemic, caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), is an unprecedented global health crisis. However, therapeutic options for treatment are still very limited. The development of drugs that target vital proteins in the viral life cycle is a feasible approach for treating COVID-19. Belonging to the subfamily Orthocoronavirinae with the largest RNA genome, SARS-CoV-2 encodes a total of 29 proteins. These non-structural, structural and accessory proteins participate in entry into host cells, genome replication and transcription, and viral assembly and release. SARS-CoV-2 proteins can individually perform essential physiological roles, be components of the viral replication machinery or interact with numerous host cellular factors. In this Review, we delineate the structural features of SARS-CoV-2 from the whole viral particle to the individual viral proteins and discuss their functions as well as their potential as targets for therapeutic interventions.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. SARS-CoV-2 genome and life cycle.
a | Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) genome organization, with functional domains shown in rectangles and the prime drug targets emphasized in the outlined box. The first part of the SARS-CoV-2 genomic RNA, encoding non-structural proteins (NSPs), can be directly translated into two polyproteins (a polyprotein is a single chain of polypeptides that are linked together by covalent peptide bonds), pp1a and pp1ab, in which a −1 frameshift between open reading fame 1a (ORF1a) and ORF1b leads to differences in translation. The two polyproteins are cleaved by viral proteases, a papain-like protease (PLpro) and a 3C-like protease (3CLpro), to generate 16 NSPs and to form the replication and transcription machinery. The domains that have an important function within each NSP are shown in the genome structure. The first three peptide cleavages are performed by PLpro. The remainders are cleaved by 3CLpro (also known as the main protease). The second part of the RNA genome encodes mainly four structural proteins: the spike (S) protein, the membrane (M) protein, the envelope (E) protein and the nucleocapsid (N) protein. In addition to these structural proteins, several accessory proteins are also encoded. b | The life cycle of SARS-CoV-2, including viral entry, replication and transcription, assembly and release. In the native SARS-CoV-2 structure, S proteins can have prefusion and postfusion conformations (Electron Microscopy Data Bank entries EMD-30426, EMD-30427 and EMD-30428). SARS-CoV-2 enters host cells through an endocytosis pathway mediated by S protein–angiotensin-converting enzyme 2 (ACE2) interactions. Viral RNA enters the cytoplasm after the entry step, and then ORF1a or ORF1ab is translated by the host ribosome. The viral polyproteins are cleaved into NSPs and assemble themselves into the replication and transcription complexes. Subgenomic viral mRNAs (after capping) act as templates for viral protein translation. Progeny virions are assembled in the endoplasmic reticulum and Golgi body. Afterwards, the virions are exocytosed to complete the life cycle. ERGIC, endoplasmic reticulum–Golgi intermediate compartment; ExoN, exonuclease; HEL, helicase; Mac1, macrodomain 1; NendoU, uridine-specific endoribonuclease; NiRAN, nidovirus RNA-dependent RNA polymerase-associated nucleotidyltransferase; NMT, guanine-N7-methyltransferase; OMT, 2′-O-methyltransferase; PL2, papain-like protease 2; RBD, receptor-binding domain; RdRp, RNA-dependent RNA polymerase; SUD, SARS-unique domain; +ss, positive-sense single-stranded; TM, transmembrane; Ubl1, ubiquitin-like domain 1; UTR, untranslated region.
Fig. 2
Fig. 2. Structures of the SARS-CoV-2 spike protein in the presence or absence of antibodies.
a | Cartoon representation of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) trimeric spike protein. The receptor-binding domain (RBD), amino-terminal domain (NTD), SD1/SD2 and S2 are blue, green, pink and red, respectively. b,c | Cartoon representations of the RBD in complex with the host cell receptor angiotensin-converting enzyme 2 (ACE2). ACE2 is orange. The side chains of amino acids participating in the interactions between the spike protein RBD and ACE2 are shown as stick models. d | Cartoon representation of the spike protein NTD in complex with a neutralizing antibody. The antibody is gold. ej | Cartoon representations of the spike protein RBD in complex with class I, II, III and IV RBD neutralizing antibodies. In part f, representative interactions between the spike protein RBD and the nanobody are shown as stick models. Antibodies can bind with the RBD despite conformational changes. Antibodies are in gold. Protein Data Bank accession codes are indicated in parentheses.
Fig. 3
Fig. 3. Structures of the SARS-CoV-2 nucleocapsid and envelope proteins.
a | Structures of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) nucleocapsid protein. The upper panel shows a schematic representation of the domains of the nucleocapsid protein. The lower-left panel shows a cartoon representation and the electrostatic surface of the amino-terminal (N-terminal) domain. It has the shape of a right-handed fist and contains a four-stranded antiparallel β-sheet as a core subdomain. The loops protruding out of the core are positively charged. These facilitate RNA binding. The lower-right panel shows a cartoon representation and the electrostatic surface of the dimeric carboxy-terminal (C-terminal) domain. Overall, it has a rectangular shape, but each protomer displays a crescent shape. Two β-hairpin structures from each protomer form four antiparallel β-strands by inserting themselves into each cavity. A positively charged groove is found in the helix face of the dimer, which also facilitates RNA binding. b | The structure of the transmembrane domain of the envelope protein. It comprises a five-helix bundle and folds like a pipe with a channel in the middle. Protein Data Bank accession codes are indicated in parentheses.
Fig. 4
Fig. 4. Structures of the SARS-CoV-2 nsp1 and nsp3 subdomains and PLpro inhibitors.
a | Surface representation of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) nsp1 in complex with the host ribosomal 40S subunit. A cartoon representation of the interaction pattern between nsp1 and the ribosomal 40S subunit is shown in the inset. nsp1, ribosomal proteins us3 and us5, and h18 ribosomal RNA (rRNA) are coloured violet, blue, green and yellow, respectively. b | Schematic representation of functional domains of nsp3. Cartoon representations of the three solved structures are shown below. c | Cartoon representation of papain-like protease (PLpro) in complex with different inhibitors. The left, middle and right panels show the inhibitors VIR250, GRL0617, and YM155, respectively. All inhibitors are presented as stick models. d | Cartoon representation of PLpro in complex with interferon-stimulated gene product 15 (ISG15). PLpro and ISG15 are coloured blue and yellow, respectively. Protein Data Bank accession codes are indicated in parentheses. Ac, acidic domain; HVR, hypervariable region; MR, marker domain; RBD, RNA-binding domain. SUD, severe acute respiratory syndrome-unique domain; TM, transmembrane.
Fig. 5
Fig. 5. Structures of SARS-CoV-2 Mpro and its inhibitors.
a | Structure of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) main protease (Mpro) in complex with inhibitor N3. Protomer A is shown as a cartoon, and protomer B is shown as a surface representation. The surface representation of the substrate-binding site and N3 is shown in the right panel. Subsites S1′, S1, S2 and S4 are labelled. b | Interactions between N3 and SARS-CoV-2 Mpro. P1–P5 are labelled in N3. The residues interacting with P1′ are shown as cyan sticks, and the residues forming the S1, S2 and S4 subsites are shown as green sticks, white sticks and orange sticks, respectively. The residues interacting with P5 are shown as blue sticks. Intermolecular hydrogen bonds are shown as dashed lines. c | Surface representation of the substrate-binding site and with various inhibitors bound. GC376, boceprevir, carmofur and 11a are coloured yellow, blue, green and orange, respectively. All inhibitors are presented in stick form. Protein Data Bank accession codes are indicated in parentheses.
Fig. 6
Fig. 6. Structures of SARS-CoV-2 replication and transcription complex and its inhibitors.
af | Cartoon representations of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) central replication and transcription complex (C-RTC), elongation RTC (E-RTC), cap(−1)′ RTC and backtracking RTC. The RNA-dependent RNA polymerase (RdRP) core complex contains nsp12 (nidovirus RdRP-associated nucleotidyltransferase (NiRAN), interface, finger, palm and thumb domains are yellow, orange, blue, red and olive, respectively), nsp7 (purple), nsp8-1 (individual; light grey), nsp8-2 (nsp7–nsp8 pair; aquamarine). C-RTC is composed of the RdRP core complex and RNA template–product chains (in wheat and deep sky blue). E-RTC is composed of the C-RTC along with two coupled nsp13 helicases, nsp13-1 (lime) and nsp13-2 (deep pink), and RNA strand fragments can be traced in nsp13-2. Cap(−1)′ RTC consists of E-RTC and nsp9 (claret; bound to nsp12 NiRAN domain). Backtracking RTC includes the C-RTC coupled with the proofreading-stimulating nsp13. gi | Cartoon representations of SARS-CoV-2 C-RTC with bound inhibitors. Remdesivir can be inserted into the RNA product chain (−1 site), whereas favipiravir occupies the polymerase active centre. Two suramin molecules can bind with the RdRP core complex and suramin-1 occupies the space of positions −1 to −3 of the RNA template strand, whereas suramin-2 occupies the space of the primer strand. The inhibitors are all shown as pink stick models. Protein Data Bank accession codes are indicated in parentheses. FTP, favipiravir triphosphate; RMP, remdesivir monophosphate.
Fig. 7
Fig. 7. Structures of SARS-CoV-2 accessory proteins.
a | Cryogenic electron microscopy structure of dimeric ORF3a. It forms a potassium-sensitive channel and may promote release of the virus. The two ORF3a proteins are shown as cartoon representations, and the domains are labelled. b | The structure of ORF9b in complex with human TOM70. ORF9b adopts a helical fold and is shown in cartoon representation. It interacts at the substrate binding site of TOM70, a subunit of the mitochondrial import receptor. TOM70 is shown in a surface representation. c | The structure of the ORF7a ectodomain. ORF7a is a type I transmembrane protein and is involved in virus–host interactions and protein trafficking within the endoplasmic reticulum and Golgi body. The ectodomain of ORF7a exhibits a seven-stranded β-sandwich fold and is shown in a cartoon representation. d | The structure of dimeric ORF8. ORF8 contains eight antiparallel β-strands and an immunoglobulin-like fold. The dimer is stabilized by surface hydrophobic interactions and a series of hydrogen bonds. The two ORF8 proteins are shown as cartoon representations. Protein Data Bank accession codes are indicated in parentheses.

Similar articles

Cited by

References

    1. Masters PS. The molecular biology of coronaviruses. Adv. Virus Res. 2006;66:193–292. doi: 10.1016/S0065-3527(06)66005-3. - DOI - PMC - PubMed
    1. Weiss SR, Navas-Martin S. Coronavirus pathogenesis and the emerging pathogen severe acute respiratory syndrome coronavirus. Microbiol. Mol. Biol. Rev. 2005;69:635–664. doi: 10.1128/MMBR.69.4.635-664.2005. - DOI - PMC - PubMed
    1. Cavanagh D. Coronaviruses in poultry and other birds. Avian Pathol. 2005;34:439–448. doi: 10.1080/03079450500367682. - DOI - PubMed
    1. Hamre D, Procknow JJ. A new virus isolated from the human respiratory tract. Proc. Soc. Exp. Biol. Med. 1966;121:190–193. doi: 10.3181/00379727-121-30734. - DOI - PubMed
    1. Cui J, Li F, Shi Z-L. Origin and evolution of pathogenic coronaviruses. Nat. Rev. Microbiol. 2019;17:181–192. doi: 10.1038/s41579-018-0118-9. - DOI - PMC - PubMed

Publication types