Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Mar 31;13(1):1722.
doi: 10.1038/s41467-022-29398-y.

SHAPE-guided RNA structure homology search and motif discovery

Affiliations

SHAPE-guided RNA structure homology search and motif discovery

Edoardo Morandi et al. Nat Commun. .

Abstract

The rapidly growing popularity of RNA structure probing methods is leading to increasingly large amounts of available RNA structure information. This demands the development of efficient tools for the identification of RNAs sharing regions of structural similarity by direct comparison of their reactivity profiles, hence enabling the discovery of conserved structural features. We here introduce SHAPEwarp, a largely sequence-agnostic SHAPE-guided algorithm for the identification of structurally-similar regions in RNA molecules. Analysis of Dengue, Zika and coronavirus genomes recapitulates known regulatory RNA structures and identifies novel highly-conserved structural elements. This work represents a preliminary step towards the model-free search and identification of shared and conserved RNA structural features within transcriptomes.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Schematic of the SHAPEwarp algorithm.
a All the possible kmers along a query SHAPE reactivity profile are enumerated, discarding those having low structure complexity. Retained kmers are looked up into the database via the MASS algorithm. b A matrix is built storing the coordinates of each query kmer and its matches in the database, allowing the grouping of kmers lying on the same diagonal into high scoring groups (HSGs). c Each HSG is used as the seed to begin the bidirectional extension of an alignment. Banding of the alignment restricts the search space to a maximum number of bases around the diagonal. The alignment stops when the score (S) drops below a certain threshold for more than a certain number of bases. d In parallel, the same query is searched against a database of shuffled SHAPE reactivity profiles. The scores of these alignments are used to build the null distribution, further allowing to estimate the probability of obtaining an alignment score ≥S. From this the E-value of the alignment can be estimated. e Optionally, significant alignments can be further analyzed for the presence of a conserved structure by exploiting the RNAalifold algorithm.
Fig. 2
Fig. 2. Validation of SHAPEwarp.
a Box-plot depicting the distribution of E-values for true (T) and false (F) matches for E. coli 16S rRNA searched against B. subtilis 16S/23S rRNAs, both in SHAPE-only (So) and SHAPE + sequence (S + s) mode. Boxes span the 25th to the 75th percentile. The center represents the median. Values below the 25th percentile − 1.5 times the IQR, or above the 75th percentile + 1.5 times the IQR, represent outliers and are reported as dots. The inset shows a zoom-in view of the box-plot for E-values between 0 and 0.1. Sample sizes are as follows: n = 15 (SHAPE only) and n = 16 (SHAPE + sequence) for true matches, and n = 1338 (SHAPE only) and n = 457 (SHAPE + sequence) for false matches. b Sample alignments for two matching regions between the 16S rRNAs of E. coli and B. subtilis, as identified by SHAPEwarp. SHAPE reactivities have been capped to 2. The high scoring group (HSG), constituting the seed of the alignment, is shaded in gray. The insets show the same two regions in their structural context.
Fig. 3
Fig. 3. SHAPEwarp identifies novel highly-conserved viral RNA structures.
a Significant matches between SARS-CoV (query) and SARS-CoV-2 (database), identified by SHAPEwarp, either in SHAPE only (orange) or SHAPE + sequence (red) mode. The relative position of known RNA structure elements (5′ UTR, FSE and 3′ UTR) is indicated. b Aligned SHAPE reactivity profiles for one of the identified structurally-conserved regions (CoV Motif #1). SHAPE reactivities have been capped to 2. c Structure model for CoV Motif #1. Structure was generated using R2R. One-sided covariations were inferred from R2R output. Base pairs showing significant covariation (as determined by R-scape) are boxed in green (E value < 0.05) and violet (E value < 0.1) respectively. The inset illustrates base pairs having significant RNA–RNA chimera support from in vivo COMRADES, boxed in blue.

References

    1. Nawrocki EP, Eddy SR. Infernal 1.1: 100-fold faster RNA homology searches. Bioinform. Oxf. Engl. 2013;29:2933–2935. - PMC - PubMed
    1. Eddy SR, Durbin R. RNA sequence analysis using covariance models. Nucleic Acids Res. 1994;22:2079–2088. - PMC - PubMed
    1. Strobel EJ, Yu AM, Lucks JB. High-throughput determination of RNA structures. Nat. Rev. Genet. 2018;19:615–634. - PMC - PubMed
    1. Incarnato D, Oliviero S. The RNA epistructurome: uncovering RNA function by studying structure and post-transcriptional modifications. Trends Biotechnol. 2017;35:318–333. - PubMed
    1. Wells SE, Hughes JM, Igel AH, Ares M. Use of dimethyl sulfate to probe RNA structure in vivo. Methods Enzymol. 2000;318:479–493. - PubMed

Publication types