. 2022 Jul 22;50(13):e73.

doi: 10.1093/nar/gkac220.

QRNAstruct: a method for extracting secondary structural features of RNA via regression with biological activity

Goro Terai¹, Kiyoshi Asai¹

Affiliations

Affiliation

¹ Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, University of Tokyo, Kashiwanoha 5-1-5, Kashiwa, Chiba 277-8561, Japan.

PMID: 35390152
PMCID: PMC9303433
DOI: 10.1093/nar/gkac220

QRNAstruct: a method for extracting secondary structural features of RNA via regression with biological activity

Goro Terai et al. Nucleic Acids Res. 2022.

. 2022 Jul 22;50(13):e73.

doi: 10.1093/nar/gkac220.

Authors

Goro Terai¹, Kiyoshi Asai¹

Affiliation

¹ Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, University of Tokyo, Kashiwanoha 5-1-5, Kashiwa, Chiba 277-8561, Japan.

PMID: 35390152
PMCID: PMC9303433
DOI: 10.1093/nar/gkac220

Abstract

Recent technological advances have enabled the generation of large amounts of data consisting of RNA sequences and their functional activity. Here, we propose a method for extracting secondary structure features that affect the functional activity of RNA from sequence-activity data. Given pairs of RNA sequences and their corresponding bioactivity values, our method calculates position-specific structural features of the input RNA sequences, considering every possible secondary structure of each RNA. A Ridge regression model is trained using the structural features as feature vectors and the bioactivity values as response variables. Optimized model parameters indicate how secondary structure features affect bioactivity. We used our method to extract intramolecular structural features of bacterial translation initiation sites and self-cleaving ribozymes, and the intermolecular features between rRNAs and Shine-Dalgarno sequences and between U1 RNAs and splicing sites. We not only identified known structural features but also revealed more detailed insights into structure-activity relationships than previously reported. Importantly, the datasets we analyzed here were obtained from different experimental systems and differed in size, sequence length and similarity, and number of RNA molecules involved, demonstrating that our method is applicable to various types of data consisting of RNA sequences and bioactivity values.

PubMed Disclaimer

Figures

**Figure 1.**
Examples of RNA secondary structures. RNA secondary structures formed by (A) a single RNA sequence and (B) two short RNA sequences are shown. Circles represent bases, while their numbers and colors indicate the base position and the type of loop to which a base belongs, respectively. Black lines represent base pairings.

**Figure 3.**
Secondary structural features of bacterial translation initiation sites. (A) Optimized parameter values for bacterial translation initiation sites. Columns represent the relative position from the start codon. Rows represent the type of parameter: L, ; R, ; H, ; B, ; I, ; E, . The sequence pattern in the training data is shown above the heatmap, where N represents any base. The Shine–Dalgarno (SD) sequence and start codon are indicated by the green and red boxes, respectively. (B–D) The secondary structures discussed in the main text. The green and red bases represent the SD sequence and start codon, respectively. (B) An internal loop containing an SD sequence and start codon. (C) Hairpin structure around a start codon. (D) Partial secondary structure in which the bases from +4 to +18 are in the left side of base pairs.

formula image — **Figure 3.**
Secondary structural features of bacterial translation initiation sites. (A) Optimized parameter values for bacterial translation initiation sites. Columns represent the relative position from the start codon. Rows represent the type of parameter: L, ; R, ; H, ; B, ; I, ; E, . The sequence pattern in the training data is shown above the heatmap, where N represents any base. The Shine–Dalgarno (SD) sequence and start codon are indicated by the green and red boxes, respectively. (B–D) The secondary structures discussed in the main text. The green and red bases represent the SD sequence and start codon, respectively. (B) An internal loop containing an SD sequence and start codon. (C) Hairpin structure around a start codon. (D) Partial secondary structure in which the bases from +4 to +18 are in the left side of base pairs.

**Figure 4.**
Secondary structural features of twister ribozyme mutants. (A–D) RNA secondary structure of a wild-type twister ribozyme and three mutants. Circles represent bases. Black lines represent base pairs. Colored circles indicate regions forming pseudoknots. Pairs in regions shown in the same color interact with each other and form pseudoknots. Arrowheads indicate cleavage sites of the ribozyme. The numbers associated with bases indicate the base positions. (A) RNA secondary structure of a wild-type twister ribozyme experimentally determined by Liu *et al.* (27). Double and dotted lines represent *trans* Watson–Crick and *cis*-Hoogsteen:sugar edge base pairs, respectively. Arrows indicate pairs of regions forming a pseudoknot structure. (B–D) The predicted RNA secondary structure of three mutants. Mutated bases are shown in red letters. Values in parentheses are the self-cleavage activities normalized from 0 to 1. The shaded areas shown in the dashed boxes indicate the locations of a change in RNA secondary structure of the mutants compared with the wild-type twister ribozyme. (E) Optimized parameter values for twister ribozyme mutants. Each column represents a base position. Each row represents a different type of parameter: L, ; R, ; H, ; B, ; I, ; E, . The RNA sequence of the wild-type twister ribozyme is shown above the heatmap. Boxes above the heatmap indicate regions forming pseudoknots. The base changes in the three mutants are indicated above the wild-type RNA sequence.

**Figure 5.**
Optimized parameter values for the interaction between rRNAs and Shine–Dalgarno sequences. Matrix P shows values. The rows and columns of this matrix correspond to the rRNA positions and the upstream region positions relative to the start codon, respectively. The letters associated with the row and column of matrix P are the rRNA and upstream sequence patterns, respectively, where N represents any base. Matrix X shows the values of , and , and matrix Y shows the values of , and , where x and y represent the rRNA and upstream sequences, respectively. Each row of matrix X represents the position of a base in a rRNA sequence, and each column of matrix Y represents the relative position of a base in the sequence upstream of the start codon.

**Figure 6.**
Optimized parameter values for interactions between U1 RNAs and donor sites. (A and B) Parameter values for GU and GC donor sites, respectively. Matrix P shows values; the rows and columns of this matrix correspond to the U1 RNA and donor site positions, respectively. The letters associated with the rows and columns of matrix P are the U1 RNA and donor site sequence patterns, respectively, where N represents any base. Matrix X shows the values of , and , and matrix Y shows the values of , and , where x and y represent U1 RNA and donor site sequences, respectively. Each row of matrix X represents a U1 RNA base position, and each column of matrix Y represents a donor site position. (C, D) RNA secondary structures between U1 RNAs and donor sites predicted to have splicing activity. Donor sites (GU or GC) are indicated by black circles. Arrowheads indicate possible cleavage sites. (C) One RNA secondary structure between U1 RNA and GU donor site sequences associated with high splicing activity. (D) One secondary structure between U1 RNA and GC donor site sequences, in which the cleavage site is likely to be located two bases upstream of the GC site. (E) Another secondary structure between U1 RNA and GC donor site sequences.

**Figure 7.**
Comparison of optimized parameter values with and without SHAPE reactivity data. RNA sequences around the start codon and their translation efficiency in *E.coli* were used to optimize parameters. The top and bottom matrix show parameter values optimized (A) with and (B) without the SHAPE reactivity data, respectively. Columns represent the relative position from the start codon. Rows represent the type of parameter: L, ; R, ; H, ; B, ; I, ; E, .

See this image and copyright information in PMC

References

1. Serganov A., Nudler E.. A decade of riboswitches. Cell. 2013; 152:17–24. - PMC - PubMed
1. Guil S., Esteller M.. RNA-RNA interactions in gene regulation: the coding and noncoding players. Trends Biochem. Sci. 2015; 40:248–256. - PubMed
1. Doherty E.A., Doudna J.A.. Ribozyme structures and mechanisms. Annu. Rev. Biochem. 2000; 69:597–615. - PubMed
1. Ray-Soni A., Bellecourt M.J., Landick R.. Mechanisms of bacterial transcription termination: all good things must end. Annu. Rev. Biochem. 2016; 85:319–347. - PubMed
1. Staley J.P., Guthrie C.. Mechanical devices of the spliceosome: motors, clocks, springs, and things. Cell. 1998; 92:315–326. - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions

Substances

Actions

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

QRNAstruct: a method for extracting secondary structural features of RNA via regression with biological activity

Affiliation

QRNAstruct: a method for extracting secondary structural features of RNA via regression with biological activity

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources