Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Jul 22;50(13):e73.
doi: 10.1093/nar/gkac220.

QRNAstruct: a method for extracting secondary structural features of RNA via regression with biological activity

Affiliations

QRNAstruct: a method for extracting secondary structural features of RNA via regression with biological activity

Goro Terai et al. Nucleic Acids Res. .

Abstract

Recent technological advances have enabled the generation of large amounts of data consisting of RNA sequences and their functional activity. Here, we propose a method for extracting secondary structure features that affect the functional activity of RNA from sequence-activity data. Given pairs of RNA sequences and their corresponding bioactivity values, our method calculates position-specific structural features of the input RNA sequences, considering every possible secondary structure of each RNA. A Ridge regression model is trained using the structural features as feature vectors and the bioactivity values as response variables. Optimized model parameters indicate how secondary structure features affect bioactivity. We used our method to extract intramolecular structural features of bacterial translation initiation sites and self-cleaving ribozymes, and the intermolecular features between rRNAs and Shine-Dalgarno sequences and between U1 RNAs and splicing sites. We not only identified known structural features but also revealed more detailed insights into structure-activity relationships than previously reported. Importantly, the datasets we analyzed here were obtained from different experimental systems and differed in size, sequence length and similarity, and number of RNA molecules involved, demonstrating that our method is applicable to various types of data consisting of RNA sequences and bioactivity values.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Examples of RNA secondary structures. RNA secondary structures formed by (A) a single RNA sequence and (B) two short RNA sequences are shown. Circles represent bases, while their numbers and colors indicate the base position and the type of loop to which a base belongs, respectively. Black lines represent base pairings.
Figure 2.
Figure 2.
Flowchart of our method.
Figure 3.
Figure 3.
Secondary structural features of bacterial translation initiation sites. (A) Optimized parameter values for bacterial translation initiation sites. Columns represent the relative position from the start codon. Rows represent the type of parameter: L, formula image; R, formula image; H, formula image; B, formula image; I, formula image; E, formula image. The sequence pattern in the training data is shown above the heatmap, where N represents any base. The Shine–Dalgarno (SD) sequence and start codon are indicated by the green and red boxes, respectively. (B–D) The secondary structures discussed in the main text. The green and red bases represent the SD sequence and start codon, respectively. (B) An internal loop containing an SD sequence and start codon. (C) Hairpin structure around a start codon. (D) Partial secondary structure in which the bases from +4 to +18 are in the left side of base pairs.
Figure 4.
Figure 4.
Secondary structural features of twister ribozyme mutants. (A–D) RNA secondary structure of a wild-type twister ribozyme and three mutants. Circles represent bases. Black lines represent base pairs. Colored circles indicate regions forming pseudoknots. Pairs in regions shown in the same color interact with each other and form pseudoknots. Arrowheads indicate cleavage sites of the ribozyme. The numbers associated with bases indicate the base positions. (A) RNA secondary structure of a wild-type twister ribozyme experimentally determined by Liu et al. (27). Double and dotted lines represent trans Watson–Crick and cis-Hoogsteen:sugar edge base pairs, respectively. Arrows indicate pairs of regions forming a pseudoknot structure. (BD) The predicted RNA secondary structure of three mutants. Mutated bases are shown in red letters. Values in parentheses are the self-cleavage activities normalized from 0 to 1. The shaded areas shown in the dashed boxes indicate the locations of a change in RNA secondary structure of the mutants compared with the wild-type twister ribozyme. (E) Optimized parameter values for twister ribozyme mutants. Each column represents a base position. Each row represents a different type of parameter: L, formula image; R, formula image; H, formula image; B, formula image; I, formula image; E, formula image. The RNA sequence of the wild-type twister ribozyme is shown above the heatmap. Boxes above the heatmap indicate regions forming pseudoknots. The base changes in the three mutants are indicated above the wild-type RNA sequence.
Figure 5.
Figure 5.
Optimized parameter values for the interaction between rRNAs and Shine–Dalgarno sequences. Matrix P shows formula image values. The rows and columns of this matrix correspond to the rRNA positions and the upstream region positions relative to the start codon, respectively. The letters associated with the row and column of matrix P are the rRNA and upstream sequence patterns, respectively, where N represents any base. Matrix X shows the values of formula image, formula image and formula image, and matrix Y shows the values of formula image, formula image and formula image, where x and y represent the rRNA and upstream sequences, respectively. Each row of matrix X represents the position of a base in a rRNA sequence, and each column of matrix Y represents the relative position of a base in the sequence upstream of the start codon.
Figure 6.
Figure 6.
Optimized parameter values for interactions between U1 RNAs and donor sites. (A and B) Parameter values for GU and GC donor sites, respectively. Matrix P shows formula image values; the rows and columns of this matrix correspond to the U1 RNA and donor site positions, respectively. The letters associated with the rows and columns of matrix P are the U1 RNA and donor site sequence patterns, respectively, where N represents any base. Matrix X shows the values of formula image, formula image and formula image, and matrix Y shows the values of formula image, formula image and formula image, where x and y represent U1 RNA and donor site sequences, respectively. Each row of matrix X represents a U1 RNA base position, and each column of matrix Y represents a donor site position. (CD) RNA secondary structures between U1 RNAs and donor sites predicted to have splicing activity. Donor sites (GU or GC) are indicated by black circles. Arrowheads indicate possible cleavage sites. (C) One RNA secondary structure between U1 RNA and GU donor site sequences associated with high splicing activity. (D) One secondary structure between U1 RNA and GC donor site sequences, in which the cleavage site is likely to be located two bases upstream of the GC site. (E) Another secondary structure between U1 RNA and GC donor site sequences.
Figure 7.
Figure 7.
Comparison of optimized parameter values with and without SHAPE reactivity data. RNA sequences around the start codon and their translation efficiency in E.coli were used to optimize parameters. The top and bottom matrix show parameter values optimized (A) with and (B) without the SHAPE reactivity data, respectively. Columns represent the relative position from the start codon. Rows represent the type of parameter: L, formula image; R, formula image; H, formula image; B, formula image; I, formula image; E, formula image.

References

    1. Serganov A., Nudler E.. A decade of riboswitches. Cell. 2013; 152:17–24. - PMC - PubMed
    1. Guil S., Esteller M.. RNA-RNA interactions in gene regulation: the coding and noncoding players. Trends Biochem. Sci. 2015; 40:248–256. - PubMed
    1. Doherty E.A., Doudna J.A.. Ribozyme structures and mechanisms. Annu. Rev. Biochem. 2000; 69:597–615. - PubMed
    1. Ray-Soni A., Bellecourt M.J., Landick R.. Mechanisms of bacterial transcription termination: all good things must end. Annu. Rev. Biochem. 2016; 85:319–347. - PubMed
    1. Staley J.P., Guthrie C.. Mechanical devices of the spliceosome: motors, clocks, springs, and things. Cell. 1998; 92:315–326. - PubMed

Publication types

MeSH terms