Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Aug;26(8):937-959.
doi: 10.1261/rna.076141.120. Epub 2020 May 12.

RNA genome conservation and secondary structure in SARS-CoV-2 and SARS-related viruses: a first look

Affiliations

RNA genome conservation and secondary structure in SARS-CoV-2 and SARS-related viruses: a first look

Ramya Rangan et al. RNA. 2020 Aug.

Abstract

As the COVID-19 outbreak spreads, there is a growing need for a compilation of conserved RNA genome regions in the SARS-CoV-2 virus along with their structural propensities to guide development of antivirals and diagnostics. Here we present a first look at RNA sequence conservation and structural propensities in the SARS-CoV-2 genome. Using sequence alignments spanning a range of betacoronaviruses, we rank genomic regions by RNA sequence conservation, identifying 79 regions of length at least 15 nt as exactly conserved over SARS-related complete genome sequences available near the beginning of the COVID-19 outbreak. We then confirm the conservation of the majority of these genome regions across 739 SARS-CoV-2 sequences subsequently reported from the COVID-19 outbreak, and we present a curated list of 30 "SARS-related-conserved" regions. We find that known RNA structured elements curated as Rfam families and in prior literature are enriched in these conserved genome regions, and we predict additional conserved, stable secondary structures across the viral genome. We provide 106 "SARS-CoV-2-conserved-structured" regions as potential targets for antivirals that bind to structured RNA. We further provide detailed secondary structure models for the extended 5' UTR, frameshifting stimulation element, and 3' UTR. Lastly, we predict regions of the SARS-CoV-2 viral genome that have low propensity for RNA secondary structure and are conserved within SARS-CoV-2 strains. These 59 "SARS-CoV-2-conserved-unstructured" genomic regions may be most easily accessible by hybridization in primer-based diagnostic strategies.

Keywords: SARS-CoV-2; conservation; ncRNA; secondary structure; structurome.

PubMed Disclaimer

Figures

FIGURE 1.
FIGURE 1.
We aim to provide a series of genome regions in SARS-CoV-2 that are useful for a variety of diagnostic and therapeutic strategies, including regions that are (A) conserved in SARS-related betacoronaviruses and SARS-CoV-2 sequences (Table 1), (B) regions that are structured and conserved in SARS-CoV-2 sequences (Table 2), and (C) regions that are unstructured and conserved in SARS-CoV-2 sequences (Table 3).
FIGURE 2.
FIGURE 2.
In black we annotate SARSr-MSA-1 conserved regions of the genome, superimposed on SARS-CoV-2 genome ORFs. We depict the top secondary structures as ranked by Matthews correlation coefficient that overlap with these conserved regions, ordered from A to E. Regions A to E are annotated on the genome in yellow and are located at genome positions: (A) 13743–13798, (B) 17511–17566, (C) 28990–29054, (D) 172–236, (E) 26–109. Secondary structures are colored by sequence conservation in SARSr-MSA-1 (cyan = more conserved, purple = less conserved). In magenta are depicted curated Rfam families present in coronaviruses, including the frameshifting stimulation element (FSE), the 3′ UTR pseudoknot (PK3), and the 3′ stem–loop II-like motif (s2m). Figures prepared in Geneious (Kearse et al. 2012) and draw_rna (https://github.com/DasLab/draw_rna).
FIGURE 3.
FIGURE 3.
Structured (cyan) and unstructured (yellow) intervals on the genome ORFs for SARS-CoV-2, predicted from RNAz and CONTRAfold 2.0 analysis, respectively. AC highlight the three secondary structures for windows that do not overlap with known Rfam or literature-annotated structures with the highest P-value scores from RNAz (all P > 0.9). These windows are located at genome positions 14207–14366 (A), 17126–17245 (B), and 26176–26295 (C). Secondary structures are colored by sequence conservation (cyan = more conserved, purple = less conserved). Figures prepared in Geneious (Kearse et al. 2012) and draw_rna (https://github.com/DasLab/draw_rna).
FIGURE 4.
FIGURE 4.
Secondary structure diagrams for (A) 5′ UTR, (B) frameshifting stimulation element, (C) 3′ UTR. Nucleotides are black if 100% conserved in the SARS, bat, and SARS-CoV-2 sequences in SARSr-MSA-1, and gray otherwise. Special labeled domains are in boldface. Structures are based primarily on manual identification of homology with literature coronavirus structure models. Note that numbering in C is relative to 3′ end of virus sequence. Figures prepared in RiboDraw (https://github.com/ribokit/RiboDraw).
FIGURE 5.
FIGURE 5.
We depict the predicted number of structured, unstructured, and conserved intervals for a choice of sequence conservation cutoffs. The SARS-related conserved intervals are all regions of at least 15 nt with each position at least 90% conserved across an alignment of SARS, bat coronavirus, and SARS-CoV-2 sequences (SARSr-MSA-1). The SARS-CoV-2 intervals are regions of at least 15 nt with each position at least 97% conserved across an alignment of currently available SARS-CoV-2 sequences (SARS-CoV-2-MSA-2). Structured intervals are loci predicted from RNAz with some loci containing multiple RNAz windows, and unstructured intervals are stretches of at least 15 nt where all bases have base-pairing probability at most 0.4. All interval intersections are required to have at least 15 nt overlaps, with the number of overlapping intervals listed for each interval type involved in the intersection. Top-scoring structured intervals conserved in SARS-CoV-2 sequences (green) are listed in Table 2. Top-scoring unstructured intervals conserved in SARS-CoV-2 sequences (blue) are listed in Table 3.

Update of

References

    1. Andrews RJ, Roche J, Moss WN. 2018. ScanFold: an approach for genome-wide discovery of local RNA structural elements-applications to Zika virus and HIV. PeerJ 6: e6136 10.7717/peerj.6136 - DOI - PMC - PubMed
    1. Andrews RJ, Peterson JM, Haniff HS, Chen J, Williams C, Grefe M, Disney MD, Moss WN. 2020. An in silico map of the SARS-CoV-2 RNA Structurome. bioRxiv 10.1101/2020.04.17.045161 - DOI - PMC - PubMed
    1. Bennett CF, Krainer AR, Cleveland DW. 2019. Antisense oligonucleotide therapies for neurodegenerative diseases. Annu Rev Neurosci 42: 385–406. 10.1146/annurev-neuro-070918-050501 - DOI - PMC - PubMed
    1. Bernhart SH, Hofacker IL, Stadler PF. 2006. Local RNA base pairing probabilities in large sequences. Bioinformatics 22: 614–615. 10.1093/bioinformatics/btk014 - DOI - PubMed
    1. Bustin SA, Nolan T. 2004. Pitfalls of quantitative real-time reverse-transcription polymerase chain reaction. J Biomol Tech 15: 155–166. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2291693/ - PMC - PubMed

Publication types