Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2020 Jul 10:2020.07.10.197079.
doi: 10.1101/2020.07.10.197079.

Comprehensive in-vivo secondary structure of the SARS-CoV-2 genome reveals novel regulatory motifs and mechanisms

Affiliations

Comprehensive in-vivo secondary structure of the SARS-CoV-2 genome reveals novel regulatory motifs and mechanisms

Nicholas C Huston et al. bioRxiv. .

Update in

Abstract

SARS-CoV-2 is the positive-sense RNA virus that causes COVID-19, a disease that has triggered a major human health and economic crisis. The genome of SARS-CoV-2 is unique among viral RNAs in its vast potential to form stable RNA structures and yet, as much as 97% of its 30 kilobases have not been structurally explored in the context of a viral infection. Our limited knowledge of SARS-CoV-2 genomic architecture is a fundamental limitation to both our mechanistic understanding of coronavirus life cycle and the development of COVID-19 RNA-based therapeutics. Here, we apply a novel long amplicon strategy to determine for the first time the secondary structure of the SARS-CoV-2 RNA genome probed in infected cells. In addition to the conserved structural motifs at the viral termini, we report new structural features like a conformationally flexible programmed ribosomal frameshifting pseudoknot, and a host of novel RNA structures, each of which highlights the importance of studying viral structures in their native genomic context. Our in-depth structural analysis reveals extensive networks of well-folded RNA structures throughout Orf1ab and reveals new aspects of SARS-CoV-2 genome architecture that distinguish it from other single-stranded, positive-sense RNA viruses. Evolutionary analysis of RNA structures in SARS-CoV-2 shows that several features of its genomic structure are conserved across beta coronaviruses and we pinpoint individual regions of well-folded RNA structure that merit downstream functional analysis. The native, complete secondary structure of SAR-CoV-2 presented here is a roadmap that will facilitate focused studies on mechanisms of replication, translation and packaging, and guide the identification of new RNA drug targets against COVID-19.

PubMed Disclaimer

Conflict of interest statement

Declaration of Interests

A patent application on MarathonRT has been filed by Yale University.

Figures

Figure 1.
Figure 1.
Tiled-amplicon in-vivo SHAPE-MaP workflow yields high quality data for SARS-CoV-2 structure prediction. A) Workflow of in-vivo SHAPE-MaP probing of full-length SARS-CoV-2 genomic RNA. B) Mutation rates for two biological replicates confirm genomic RNA was successfully modified with NAI electrophile. The boxes represent the interquartile range (IQR) of each data-set, with the median value indicated by a line, average value indicated by a “x”. Tukey-style whiskers extend 1.5 x IQR beyond each box. Values outside this range are not shown. ****p<0.0001 by equal variance unpaired student t test.
Figure 2.
Figure 2.
De novo full-length structure prediction of SARS-CoV-2 genomic RNA identifies conserved functional elements at the 5’ and 3’ viral termini. A) Consensus structure prediction for the 5’ terminus of SARS-CoV-2, colored by SHAPE Reactivity. Functional domains are labeled, including TRS sequence, start codon of uORF, and start codon of Orf1a (indicated by black, grey, and green lines, respectively). Inset – mapping of SHAPE reactivity data to single- and double-stranded regions, data are plotted with a line indicating the median, and whiskers indicating the standard deviation B) Structure prediction for the 3’ terminus of SARS-CoV-2, colored by SHAPE reactivity. Functional domains are labeled. The putative pseudoknot is indicated by solid black lines. Locations of the octanucleotide motif (ONM), hypervariable region (HVR) and S2M are indicated by black lines. Inset – mapping of SHAPE reactivity to single- and double-stranded regions. Data are plotted with a line indicating the median, and whiskers indicating the standard deviation ****p<0.0001 by equal variance unpaired student t test.
Figure 3.
Figure 3.
Structure prediction of the programmed ribosomal frame-shifting (PRF) element suggests conformational variability of Stem Loop 2. A) Dominant PRF structural architecture, predicted by SuperFold, colored by relative SHAPE Reactivity from this study. AS = Attenuator Stem; HSS = Heptanucleotide Slippery Sequence; SL1 = Stem Loop 1; dotted line indicates region reported to form stem loop 2 (SL2) or to form long-range interactions outside the PRF region in the SuperFold predition; SL3 = Stem Loop 3; Red lines indicate pseudoknot interaction. B) Lower probability PRF conformation, with fully-formed SL2, colored by relative SHAPE Reactivity C) Dominant PRF structure prediction colored by relative Shannon entropy, labeled as in Panel A. D) Base-pairing probability for alternate SL2 conformation. Each dot represents a base pair in SL2. A base-pairing probability of 0.25 indicates a 25% probability of pairing for the indicated nucleotide.
Figure 4.
Figure 4.
Full-length genome structure prediction of SARS-CoV-2 Orf1ab reveals a network of well-folded regions. A) Analysis of Shannon Entropy and SHAPE reactivities reveals 40 highly structured, well-determined domains in Orf1ab. Nucleotide coordinates are indicated on the x-axis and numbered in 1000 nucleotide intervals. Local median SHAPE reactivity and local median Shannon Entropy are indicated by blue and orange lines, respectively. Well-folded regions are shaded with grey boxes. Arc plots for all base-pairing interactions predicted by the structural model are shown beneath the local SHAPE and Shannon entropy windows, corresponding to the genomic coordinates indicated on the x-axis. The 5’UTR and non-structural protein (Nsp) domains are indicated by colored bars underneath arc plot diagrams. B) Representative secondary structure predictions of two regions extracted from the full-length consensus structure generated for the SARS-CoV-2 genome, with Nsp identity and genomic position indicated.
Figure 5.
Figure 5.
Full-length genome structure prediction of SARS-CoV-2 Orf1ab reveals a unique genome architecture. A) Regions encoding individual non-structural protein (Nsp) domains have comparable overall double-stranded RNA content (indicated by grey bars), but they do not adopt equally well-folded substructures (indicated by black bars). A dotted line at 50% nucleotide content has been added for clarity. B) SARS-CoV-2 has a shorter median base-pairing distance when compared to median base-pairing distance in previously reported, full-length genome structures for two other positive-sense RNA viruses (Mauger, et. al., 2015; Dethoff, et. al., 2018). Data are presented in Tukey-style box and whiskers plot as described in Fig. 1B. Asterisk definitions are below. C) SARS-CoV-2 has a shorter median base-pairing distance across well-folded regions of RNA when compared to those identified in HIV (Siegfried, et al., 2014). Data are presented as in B). *p<0.05, ****p<0.0001 by equal variance unpaired student t test.
Figure 6.
Figure 6.
Structure-dependent variations in synonymous mutation rates suggest that all β-coronaviruses have highly structured genomes (high BPC). A) Synonymous mutation rates calculated across all β-coronaviruses for single- and double-stranded nucleotides of Orf1ab. Data are presented in Tukey-style box and whiskers plot as described in Fig. 1B. B) Non-synonymous mutation rates calculated across all β-coronaviruses for single- and double-stranded nucleotides of Orf1ab. Data are presented as in (A). C) Comparison of synonymous mutation rates for single- and double-stranded nucleotides within individual protein domains, calculated across all β-coronaviruses. Data are presented as in (A). n.s. not significant,*p<0.05, ***p<0.001 ****p<0.0001 by equal variance unpaired student t test.
Figure 7.
Figure 7.
Analysis of synonymous mutation rates within individual well-folded regions of the SARS-CoV-2 genome identifies four regions that appear to be conserved across β-coronaviruses. A) Schematic of well-folded regions in SARS-COV2 genome supported by Synonymous mutation rate analysis in β-coronaviruses. B) Synonymous mutation rate separated by stranded-ness in four individual well-folded regions. Data are plotted with a line indicating the median, and whiskers indicating the interquartile range central. *p<0.05, **p<0.01 by equal variance unpaired student t test. C), D), E), F) RNA secondary structure diagrams of four well-folded regions supported by analysis of synonymous mutation rates, colored by SHAPE reactivities, with genomic coordinates indicated below and in (A).
Figure 8.
Figure 8.
Analysis of synonymous mutation rates and covariation within individual regions of the SARS-CoV-2 genome pinpoints five regions that are conserved only within the sarbecovirus subgenus. A) Schematic of well-folded regions in the SARS-COV2 genome supported by Synonymous mutation rate analysis in the sarbecovirus subgenus. B) Synonymous mutation rate separated by stranded-ness in five individual well-folded regions. Data are plotted with a line indicating the median, and whiskers indicating the interquartile range central. *p<0.05, **p<0.01 by equal variance unpaired student t test. C),D) RNA secondary structures of two well-folded regions colored by SHAPE reactivity E), F), G)RNA secondary structure diagrams of three well-folded regions supported by both synonymous mutation rate analysis and covariation in sarbecoviruses, colored by SHAPE reactivities. Green boxes indicate significant covariation base pairs tested by Rscape-RAFSp(e-value<0.05). Consensus nucleotides are colored by relative degree of sequence conservation within the alignment (75% identity in gray, 90% identify in black, 97% identity in red). Individual nucleotides are represented by circles according to their positional conservation and percentage occupancy thresholds (50% occupancy in white, 75% occupancy in grey, 90% occupancy in black, 97% occupancy in red). Multiple sequence alignment files are provided in supplementary materials.

Similar articles

References

    1. ADAMS R. L., PIRAKITIKULR N. & PYLE A. M. 2017. Functional RNA structures throughout the Hepatitis C Virus genome. Curr Opin Virol, 24, 79–86. - PMC - PubMed
    1. ANDREWS R. J., PETERSON J. M., HANIFF H. S., CHEN J., WILLIAMS C., GREFE M., DISNEY M. D. & MOSS W. N. 2020. An in silico map of the SARS-CoV-2 RNA Structurome. bioRxiv. - PMC - PubMed
    1. ASSIS R. 2014. Strong epistatic selection on the RNA secondary structure of HIV. PLoS Pathog, 10, e1004363. - PMC - PubMed
    1. BARANOV P. V., HENDERSON C. M., ANDERSON C. B., GESTELAND R. F., ATKINS J. F. & HOWARD M. T. 2005. Programmed ribosomal frameshifting in decoding the SARS-CoV genome. Virology, 332, 498–510. - PMC - PubMed
    1. BARROWS N. J., CAMPOS R. K., LIAO K. C., PRASANTH K. R., SOTO-ACOSTA R., YEH S. C., SCHOTT-LERNER G., POMPON J., SESSIONS O. M., BRADRICK S. S. & GARCIA-BLANCO M. A. 2018. Biochemistry and Molecular Biology of Flaviviruses. Chem Rev, 118, 4448–4482. - PMC - PubMed

Publication types