Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Feb 4;81(3):584-598.e5.
doi: 10.1016/j.molcel.2020.12.041. Epub 2021 Jan 1.

Comprehensive in vivo secondary structure of the SARS-CoV-2 genome reveals novel regulatory motifs and mechanisms

Affiliations

Comprehensive in vivo secondary structure of the SARS-CoV-2 genome reveals novel regulatory motifs and mechanisms

Nicholas C Huston et al. Mol Cell. .

Abstract

Severe-acute-respiratory-syndrome-related coronavirus 2 (SARS-CoV-2) is the positive-sense RNA virus that causes coronavirus disease 2019 (COVID-19). The genome of SARS-CoV-2 is unique among viral RNAs in its vast potential to form RNA structures, yet as much as 97% of its 30 kilobases have not been structurally explored. Here, we apply a novel long amplicon strategy to determine the secondary structure of the SARS-CoV-2 RNA genome at single-nucleotide resolution in infected cells. Our in-depth structural analysis reveals networks of well-folded RNA structures throughout Orf1ab and reveals aspects of SARS-CoV-2 genome architecture that distinguish it from other RNA viruses. Evolutionary analysis shows that several features of the SARS-CoV-2 genomic structure are conserved across β-coronaviruses, and we pinpoint regions of well-folded RNA structure that merit downstream functional analysis. The native, secondary structure of SARS-CoV-2 presented here is a roadmap that will facilitate focused studies on the viral life cycle, facilitate primer design, and guide the identification of RNA drug targets against COVID-19.

Keywords: RNA genome; RNA motif; RNA secondary structure; RNA structure; RNA virus; SHAPE-MaP; chemical probing; coronavirus; locked nucleic acids; riboregulation.

PubMed Disclaimer

Conflict of interest statement

Declaration of interests A patent application on MarathonRT has been filed by Yale University.

Figures

None
Graphical abstract
Figure 1
Figure 1
Tiled-amplicon in vivo SHAPE-MaP workflow yields high-quality data for de novo full-length structure prediction Structure prediction identifies conserved functional elements at the 5′ and 3′ viral termini. (A) Workflow of in vivo SHAPE-MaP probing of full-length SARS-CoV-2 genomic RNA. The schematic of the SARS-CoV-2 genome is colored by protein-coding domain. (B) Mutation rates for two biological replicates across the entire SARS-CoV-2 genome (box, interquartile range [IQR]; median indicated by line; average indicated by “x”; whiskers are drawn in the Tukey style, and values outside this range are not shown). (C) Consensus structure prediction for the 5′ terminus of SARS-CoV-2, colored by SHAPE reactivity. Functional domains are labeled (transcription regulatory sequence (TRS), black line; upstream ORF start codon, gray line; Orf1a start codon, green line). Inset – mapping of SHAPE reactivity data to single- and double-stranded regions. Line indicates median, and whiskers indicate standard deviation. (D) Structure prediction for the 3′ terminus of SARS-CoV-2, colored by SHAPE reactivity. Functional domains are labeled. The putative pseudoknot is indicated by solid black lines. Inset: mapping of SHAPE reactivity to single- and double-stranded regions. Data are plotted as in (C). ∗∗∗∗p < 0.0001 by equal variance unpaired Student’s t test.
Figure 2
Figure 2
Structure prediction of the programmed ribosomal frameshifting pseudoknot (PRF) suggests conformational variability of stem loop 2 (A) Dominant PRF structural architecture colored by SHAPE reactivity. AS, attenuator stem; HSS, heptanucleotide slippery sequence; SL1, stem loop 1; SL3, stem loop 3. Dotted line indicates region that forms stem loop 2 (SL2) or long-range interactions outside the PRF, and red lines indicate pseudoknot interaction. (B) Lower probability PRF conformation, with fully formed SL2, colored by SHAPE reactivity. (C) Dominant PRF structure prediction colored by Shannon entropy, labeled as in (A). (D) Base-pairing probability for alternate SL2 conformation. Each dot represents an individual base pair in SL2, plotted as in Figure 1C (inset).
Figure 3
Figure 3
Full-length genome structure prediction of SARS-CoV-2 Orf1ab reveals a network of well-folded regions (A) Analysis of Shannon entropy and SHAPE reactivities reveals 40 highly structured, well-determined domains in Orf1ab. Nucleotide coordinates are indicated on the x axis. Local median SHAPE reactivity and Shannon entropy are indicated by blue and orange lines, respectively. Well-folded regions are shaded with gray boxes. Arc plots for predicted base-pairing interactions in the structural model are shown below the x axis. The 5′ UTR and nonstructural protein (Nsp) domains are indicated by colored bars underneath arc plot diagrams. (B) Representative secondary structure predictions of two regions extracted from the full-length consensus structure generated for the SARS-CoV-2 genome.
Figure 4
Figure 4
Full-length genome structure prediction of SARS-CoV-2 Orf1ab reveals unique and conserved genome architecture (A) Base-paired RNA content (gray bars) and well-folded RNA content (black bars) of individual Nsp domains. A dotted line at 50% nucleotide content has been added for clarity. (B) Median base-pairing distance of the SARS-CoV-2, Hepatitis C virus (HCV), and Dengue virus. Data are plotted as in Figure 1C (inset), though outliers are excluded. (C) Median base-pairing distance across well-folded regions identified in SARS-CoV-2 and HIV genomes, plotted as in (B). (D) Synonymous mutation rates (dS) calculated across β-coronaviruses for single- and double-stranded nucleotides of Orf1ab. (E) Nonsynonymous mutation rates (dNs) calculated across all β-coronaviruses for single- and double-stranded nucleotides of Orf1ab. (F) Comparison of dS for single- and double-stranded nucleotides within individual protein domains, calculated across all β-coronaviruses. (D-F) Data are plotted as in Figure 1B. n.s., not significant; p < 0.05; ∗∗∗p < 0.001; ∗∗∗∗p < 0.0001 by equal variance unpaired Student’s t test.
Figure 5
Figure 5
Analysis of dS within individual well-folded regions of the SARS-CoV-2 genome across β-coronaviruses (A) Schematic of well-folded regions in SARS-CoV-2 genome supported by dS analysis in β-coronaviruses. (B) dS separated by strandedness in four individual well-folded regions. Data are plotted as in Figure 1C (inset). p < 0.05, ∗∗p < 0.01 by equal variance unpaired Student’s t test. (C–F) RNA secondary structure diagrams of four well-folded regions with dS support, colored by SHAPE reactivities, with genomic coordinates indicated below and in (A).
Figure 6
Figure 6
Analysis of dS and covariation within individual regions of the SARS-CoV-2 genome within the sarbecovirus subgenus (A) Schematic of well-folded regions in the SARS-CoV-2 genome supported by dS analysis. (B) dS separated by strandedness in five individual well-folded regions. Data are plotted as in Figure 1C (inset). p < 0.05, ∗∗p < 0.01 by equal variance unpaired Student’s t test. (C and D) RNA secondary structures of two well-folded regions colored by SHAPE reactivity. (E–G) RNA secondary structure diagrams of three well-folded regions supported by both dS analysis and covariation in sarbecoviruses, colored by SHAPE reactivities. Green boxes indicate significantly covarying base pairs tested by Rscape-RAFSp (e-value < 0.05). Consensus nucleotides are colored by degree of sequence conservation (75% = gray; 90% = black; 97% = red). Circles indicate positional conservation and percentage occupancy thresholds (50% = white; 75% = gray; 90% = black; 97% = red).
Figure 7
Figure 7
RNA structures disrupted by locked nucleic acids (LNAs) exhibit defects in SARS-CoV-2 viral growth (A) Schematic showing region 15 LNA targeted to the covarying stem (red line) and control LNA (blue line). (B) Schematic showing region 22 LNA targeted to stem (red line) and the control LNA (blue line). (C) Schematic showing LNA targeted to the PRF SL1 region and the conformationally flexible SL2 region in the SARS-CoV-2 PRF. (D–F) Virus growth as measured and quantified by mNeonGreen expression at 24 hours post-infection. All LNAs were tested concurrently and are split into subpanels for clarity. The same negative controls (scrambled LNA, reagent only) are shown in all subpanels for comparison. Data are plotted as in Figure 1C (inset). Individual data points represent technical replicates. n.s., not significant; p < 0.05, ∗∗p < 0.01, ∗∗∗∗p < 0.0001 by ordinary one-way ANOVA with multiple comparisons.

Update of

References

    1. Andrews R.J., Peterson J.M., Haniff H.S., Chen J., Williams C., Grefe M., Disney M.D. An in silico map of the SARS-CoV-2 RNA structurome. bioRxiv. 2020 doi: 10.1101/2020.04.17.045161. - DOI - PMC - PubMed
    1. Assis R. Strong epistatic selection on the RNA secondary structure of HIV. PLoS Pathog. 2014;10:e1004363. - PMC - PubMed
    1. Baranov P.V., Henderson C.M., Anderson C.B., Gesteland R.F., Atkins J.F., Howard M.T. Programmed ribosomal frameshifting in decoding the SARS-CoV genome. Virology. 2005;332:498–510. - PMC - PubMed
    1. Benson D.A., Cavanaugh M., Clark K., Karsch-Mizrachi I., Ostell J., Pruitt K.D., Sayers E.W. GenBank. Nucleic Acids Res. 2018;46(D1):D41–D47. - PMC - PubMed
    1. Busan S., Weeks K.M. Visualization of RNA structure models within the Integrative Genomics Viewer. RNA. 2017;23:1012–1018. - PMC - PubMed

Publication types