Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Mar 1;95(5):e02190-20.
doi: 10.1128/JVI.02190-20. Epub 2020 Dec 2.

The global and local distribution of RNA structure throughout the SARS-CoV-2 genome

Affiliations

The global and local distribution of RNA structure throughout the SARS-CoV-2 genome

Rafael de Cesaris Araujo Tavares et al. J Virol. .

Abstract

SARS-CoV-2 is the causative viral agent of COVID-19, the disease at the center of the current global pandemic. While knowledge of highly structured regions is integral for mechanistic insights into the viral infection cycle, very little is known about the location and folding stability of functional elements within the massive, ∼30kb SARS-CoV-2 RNA genome. In this study, we analyze the folding stability of this RNA genome relative to the structural landscape of other well-known viral RNAs. We present an in-silico pipeline to predict regions of high base pair content across long genomes and to pinpoint hotspots of well-defined RNA structures, a method that allows for direct comparisons of RNA structural complexity within the several domains in SARS-CoV-2 genome. We report that the SARS-CoV-2 genomic propensity for stable RNA folding is exceptional among RNA viruses, superseding even that of HCV, one of the most structured viral RNAs in nature. Furthermore, our analysis suggests varying levels of RNA structure across genomic functional regions, with accessory and structural ORFs containing the highest structural density in the viral genome. Finally, we take a step further to examine how individual RNA structures formed by these ORFs are affected by the differences in genomic and subgenomic contexts, which given the technical difficulty of experimentally separating cellular mixtures of sgRNA from gRNA, is a unique advantage of our in-silico pipeline. The resulting findings provide a useful roadmap for planning focused empirical studies of SARS-CoV-2 RNA biology, and a preliminary guide for exploring potential SARS-CoV-2 RNA drug targets.Importance The RNA genome of SARS-CoV-2 is among the largest and most complex viral genomes, and yet its RNA structural features remain relatively unexplored. Since RNA elements guide function in most RNA viruses, and they represent potential drug targets, it is essential to chart the architectural features of SARS-CoV-2 and pinpoint regions that merit focused study. Here we show that RNA folding stability of SARS-CoV-2 genome is exceptional among viral genomes and we develop a method to directly compare levels of predicted secondary structure across SARS-CoV-2 domains. Remarkably, we find that coding regions display the highest structural propensity in the genome, forming motifs that differ between the genomic and subgenomic contexts. Our approach provides an attractive strategy to rapidly screen for candidate structured regions based on base pairing potential and provides a readily interpretable roadmap to guide functional studies of RNA viruses and other pharmacologically relevant RNA transcripts.

PubMed Disclaimer

Figures

FIG 1
FIG 1
Distributions of Z-scores for the RNA genomes of SARS-CoV-2, HCV, and West Nile viruses and a composite of human mRNAs. The bar plots are frequency distributions (y axis) of free-energy Z-scores (x axis) calculated in sliding windows tiling each RNA. Each histogram is overlaid with a Gaussian (normal distribution) fit represented by a solid blue curve.
FIG 2
FIG 2
A pipeline to predict and quantify the base pair content across SARS-CoV-2 genome and identify well-defined structured regions. (A) A scheme depicting the steps to predict the secondary structure of SARS-CoV-2 genome in windows using SuperFold. A histogram of base pair content (BPC) values calculated from the predicted secondary structure (gray bar plot) is shown, and the median BPC is indicated (0.61). (B) A strategy to identify well-defined structures. The scheme shows shaded regions containing nucleotides that pass two criteria: high relative BPC (upper graph, dashed line indicating the median value of 0.5) and low Shannon entropy (lower graph, dashed line indicating the global Shannon median). The red square highlights one of the regions flagged as forming a well-defined structure. (C) A Venn diagram showing the overlap between the total number of nucleotides identified as having well-defined structure using the procedure for panel B and those nucleotides with low average Z-scores (below the global median) as reported by Andrews et al. (13).
FIG 3
FIG 3
Distribution of well-defined RNA structures predicted for the HCV genome. The percentage of nucleotides in well-defined structured regions (high BPC/low Shannon) was calculated in 100-nt bins tiling HCV genomic sequence and is plotted as a function of the genomic coordinate (gray curve). Individual percentages of each genomic bin are also represented as a heatmap in the graph (color legend on the top right-hand corner). The locations of well-studied structural elements in the HCV genome are indicated with asterisks next to their respective genomic divisions, and details on each individual element are presented in Table 1.
FIG 4
FIG 4
Distribution of well-defined RNA structures across the SARS-CoV-2 genome. (A) The percentage of nucleotides in well-defined structured regions (high BPC/low Shannon) was calculated in 100-nt bins tiling the genome and is plotted as a function of the genomic coordinate (gray curve). Individual percentages of each genomic bin are also represented as a heat map in the same graph (color key on the top right-hand corner). A scheme representing the genomic divisions of SARS-CoV-2 is shown next to the plot to guide location of structured regions. (B) An expanded view of the initial two-thirds of the genome from the graph in panel A is shown along with the genomic divisions of this region (UTR plus ORF1ab and corresponding NSP divisions). (C) The downstream third of the genome is expanded from the graph in panel A to zoom in on individual structural and accessory ORFs in this region.
FIG 5
FIG 5
Quantification of well-defined structure in SARS-CoV-2 subdivisions. (A) The percentages of nucleotides with well-defined structure (high BPC/low Shannon) are shown for each genomic section of SARS-CoV-2. A cartoon of genomic regions is depicted above the bar plot, and each region is color coded relative to the bar graph. (B) The percentages of nucleotides with well-defined structure (high BPC/low Shannon) are shown (gray bars) for each NSP (nonstructural protein) section of SARS-CoV-2 ORF1ab. The horizontal dashed line (blue) represents the percentage corresponding to the entire ORF1ab.
FIG 6
FIG 6
Comparison between in silico prediction in this study and experimental (in-cell SHAPE) structure reported by Huston et al. 2020 (32) for the ORF1ab region (including the 5′ UTR). The plots (gray lines) show the distribution of well-defined structure (percent high base pair content/low Shannon entropy) calculated in 100-nt bins tiling the ORF1ab region from both structural models. The same values are represented as a heat map in each graph to depict regions of high and low structural content according to the key shown on the upper right-hand corner. A cartoon with the genomic subdivisions of ORF1ab is shown to guide data visualization. The computed correlation coefficients between both data sets are shown in the table.
FIG 7
FIG 7
Context-dependent formation of secondary structures in the SARS-CoV-2 nucleocapsid ORF. (A) The base pair content for the N ORF (total of 1,260 nucleotides) is plotted as a function of the nucleotide number for the genomic RNA (gray curve) and the N sgRNA (magenta curve). The x axis numbering represents the N ORF nucleotide order (1 to 1260). (B) In silico secondary structure predictions containing the upstream 434 nucleotides of the N ORF are shown for both genomic and N subgenomic RNAs. The region containing structural differences identified in panel A is shown, and the highlighted regions (yellow) show significantly different RNA folding in both contexts. In the genomic RNA, the gray region represents a downstream segment of ORF8 and a 14-nt stretch of additional sequence containing the TRS (5′-ACGAAC-3′). In the N subgenomic RNA, the gray region is the 5′ leader sequence and a homologous stretch of additional sequence containing the TRS. Structures were drawn on VARNA (64). (C) Arc diagram comparison of in silico and in-cell SHAPE secondary structural models of upstream N ORF. Base pairs involving the N ORF that are context dependent (forming exclusively in either gRNA or sgRNA) are highlighted in orange. Base pairs forming within sequences upstream of the N ORF (ORF8 in gRNA or the leader sequence in N sgRNA) are represented in gray, and base pairs within the N ORF that are not affected by sequence context are drawn in blue. Green dashed boxes show expanded junction regions in both cases (ORF8-N ORF in gRNA, 5′ leader-N ORF in sgRNA).

Similar articles

Cited by

References

    1. Wu F, Zhao S, Yu B, Chen YM, Wang W, Song ZG, Hu Y, Tao ZW, Tian JH, Pei YY, Yuan ML, Zhang YL, Dai FH, Liu Y, Wang QM, Zheng JJ, Xu L, Holmes EC, Zhang YZ. 2020. A new coronavirus associated with human respiratory disease in China. Nature 579:265–269. doi:10.1038/s41586-020-2008-3. - DOI - PMC - PubMed
    1. Sempowski GD, Saunders KO, Acharya P, Wiehe KJ, Haynes BF. 2020. Pandemic preparedness: developing vaccines and therapeutic antibodies for COVID-19. Cell 181:1458–1463. doi:10.1016/j.cell.2020.05.041. - DOI - PMC - PubMed
    1. Casanova JL, Su HC, Effort CHG, COVID Human Genetic Effort. 2020. A global effort to define the human genetics of protective immunity to SARS-CoV-2 infection. Cell 181:1194–1199. doi:10.1016/j.cell.2020.05.016. - DOI - PMC - PubMed
    1. Gates B. 2020. Responding to Covid-19—a once-in-a-century pandemic? N Engl J Med 382:1677–1679. doi:10.1056/NEJMp2003762. - DOI - PubMed
    1. Fehr AR, Perlman S. 2015. Coronaviruses: an overview of their replication and pathogenesis. Coronaviruses 1282:1–23. doi:10.1007/978-1-4939-2438-7_1. - DOI - PMC - PubMed

LinkOut - more resources