. 2010 Feb;16(2):280-9.

doi: 10.1261/rna.1923210. Epub 2009 Dec 23.

Natural and artificial RNAs occupy the same restricted region of sequence space

Ryan Kennedy¹, Manuel E Lladser, Zhiyuan Wu, Chen Zhang, Michael Yarus, Hans De Sterck, Rob Knight

Affiliations

PMID: 20032164
PMCID: PMC2811657
DOI: 10.1261/rna.1923210

Natural and artificial RNAs occupy the same restricted region of sequence space

Ryan Kennedy et al. RNA. 2010 Feb.

. 2010 Feb;16(2):280-9.

doi: 10.1261/rna.1923210. Epub 2009 Dec 23.

Authors

Ryan Kennedy¹, Manuel E Lladser, Zhiyuan Wu, Chen Zhang, Michael Yarus, Hans De Sterck, Rob Knight

Affiliation

¹ Department of Computer Science, University of Colorado, Boulder, Colorado 80309, USA.

PMID: 20032164
PMCID: PMC2811657
DOI: 10.1261/rna.1923210

Abstract

Different chemical and mutational processes within genomes give rise to sequences with different compositions and perhaps different capacities for evolution. The evolution of functional RNAs may occur on a "neutral network" in which sequences with any given function can easily mutate to sequences with any other. This neutral network hypothesis is more likely if there is a particular region of composition that contains sequences that are functional in general, and if many different functions are possible within this preferred region of composition. We show that sequence preferences in active sites recovered by in vitro selection combine with biophysical folding rules to support the neutral network hypothesis. These simple active-site specifications and folding preferences obtained by artificial selection experiments recapture the previously observed purine bias and specific spread along the GC axis of naturally occurring aptamers and ribozymes isolated from organisms, although other types of RNAs, such as miRNA precursors and spliceosomal RNAs, that act primarily through complementarity to other amino acids do not share these preferences. These universal evolved sequence features are therefore intrinsic in RNA molecules that bind small-molecule targets or catalyze reactions.

PubMed Disclaimer

Figures

**FIGURE 1.**
Striking similarities between distributions of (a) natural aptamers (green) and ribozymes (blue), and (b) artificial aptamers and ribozymes colored by function (nucleotide binding, red; antibiotic binding, blue; amino acid binding, yellow; self-cleaving ribozyme, gray; other, green); (c) the superposition of the two. Results for artificial sequences shown here and in Figure 2 are for 100-nt sequences; sequence length had little effect (Fig. 7). Summing the individual motif probabilities, rather than calculating motif overlaps, gave similar results (Fig. 8).

**FIGURE 2.**
Separate components of active-site composition and folding preferences. (*Left*) Active-site sequence requirements; (*middle*) folding; (*right*) combined. Although more of the effect comes from the active-site requirements than from folding, the effects of folding shift the overall position of the distribution.

**FIGURE 3.**
miRNA precursors and guide RNAs (spliceosomal and snRNAs, whose functionality is governed by complementarity to a target) do not follow the same compositional distribution as do RNAs that are themselves functional (i.e., aptamers and ribozymes). Only natural ribozymes and aptamers (riboswitches) follow the patterns shown by the high-probability regions of the artificially selected motifs.

**FIGURE 4.**
Overall workflow. (a) Motifs were identified from sequences in the literature. (b) We included all motifs where both a secondary structure diagram and a multiple sequence alignment of the corresponding sequences were available to us. We used RNAfold to predict the folding of the sequences corresponding to each motif, and excluded motifs where none of the sequences for that motif folded into a secondary structure compatible with the published secondary structure diagram (four of 33 motifs examined overall). (c) For each location in sequence space where the frequencies of each nucleotide were an even multiple of 5% (e.g., 55% A, 15% C, 20% A, 10% U), we calculated the probability of each motif using the new upper-bound method (see Materials and Methods). (d) At the same locations, we also calculated the conditional probability of folding correctly, given that the motif was present, by sampling 10,000 sequences drawn from the distribution of sequences containing the motif, folding each sequence with RNAfold, and calculating the fraction of sequences for which the calculated minimum free energy structure was compatible with the motif. (e) Finally, we multiplied these two probabilities together to obtain the joint probability that a randomly chosen sequence of a given length and composition both contains the sequence elements required for the motif *and* folds correctly. We repeated this procedure for each of the 969 5% interior composition intervals in the space of possible compositions (i.e., compositions that have at least 5% of each base and an even multiple of 5% of all bases). (f) We then modeled the probability distribution of each motif as a multivariate normal distribution, showing ellipsoids at 1 standard deviation from the mean. Superimposing all these ellipsoids allowed us to determine the regions at which each function, or combination of functions, was most likely to occur. (g) Finally, we downloaded biological aptamer and ribozyme sequences from Rfam, plotted their compositions (so that each point corresponds to an individual aptamer or ribozyme sequence), and superimposed them on the distribution of artificial motifs.

**FIGURE 5.**
Fit between exact and upper-bound calculations. Red points indicate conditions that failed inclusion criteria (i.e., probability of an individual module >0.01, or probability over all modules >0.001: these criteria were set such that all examined motifs were included). The same motifs were used for both sets of calculations, so the graphs are nearly identical. Correlations and relative errors are as follows. Upper-Bound: r² = 0.998, r² for filtered points only = 0.999999, mean relative error = 12.9, mean filtered relative error = 0.0093. Poisson: r² = 0.997, r² filtered = 0.999999, mean relative error = 12.8, mean filtered relative error = 0.00076. For numerical stability we approximated 1 − e^−x by its second-order Taylor series when 0 < x <10⁻⁸. Thus the two methods perform similarly and provide excellent agreement with exact calculations over the range of motifs examined.

**FIGURE 6.**
GridBASE deployment diagram. Rectangular boxes represent different machines. The thick solid lines represent connections to the database. The thin solid lines represent direct control interactions initiated by the operator component. The dashed lines represent notification of workers by their associated brokers (multiple brokers may be employed, for instance, to handle firewall restrictions).

**FIGURE 7.**
Effects of sequence length and GU base pairs on abundance. Varying the sequence length from 50 to 150 bases and keeping or omitting GU base pairs had little effect on compositional preferences, except that some motifs were unable to fold without GU pairs and others were unable to fit into the shorter sequence lengths (50, 100, and 150 base sequences with GU pairs; 50 or 100 base sequences as needed to contain the motif without GU pairs).

**FIGURE 8.**
Summing the probabilities across motifs provides results similar to examining motif overlap. Brown points have radii proportional to the sum of probabilities of any motif: compare to Figure 1.

See this image and copyright information in PMC

Cited by

On the emergence of structural complexity in RNA replicators.
Oliver CG, Reinharz V, Waldispühl J. Oliver CG, et al. RNA. 2019 Dec;25(12):1579-1591. doi: 10.1261/rna.070391.119. Epub 2019 Aug 29. RNA. 2019. PMID: 31467146 Free PMC article.
RNA regulators responding to ribosomal protein S15 are frequent in sequence space.
Slinger BL, Meyer MM. Slinger BL, et al. Nucleic Acids Res. 2016 Nov 2;44(19):9331-9341. doi: 10.1093/nar/gkw754. Epub 2016 Aug 31. Nucleic Acids Res. 2016. PMID: 27580716 Free PMC article.
Nucleotides that are essential but not conserved; a sufficient L-tryptophan site in RNA.
Majerfeld I, Chocholousova J, Malaiya V, Widmann J, McDonald D, Reeder J, Iyer M, Illangasekare M, Yarus M, Knight R. Majerfeld I, et al. RNA. 2010 Oct;16(10):1915-24. doi: 10.1261/rna.2220210. Epub 2010 Aug 10. RNA. 2010. PMID: 20699302 Free PMC article.
The paradox of dual roles in the RNA world: resolving the conflict between stable folding and templating ability.
Ivica NA, Obermayer B, Campbell GW, Rajamani S, Gerland U, Chen IA. Ivica NA, et al. J Mol Evol. 2013 Sep;77(3):55-63. doi: 10.1007/s00239-013-9584-x. J Mol Evol. 2013. PMID: 24078151 Free PMC article.
Type-II tRNAs and Evolution of Translation Systems and the Genetic Code.
Kim Y, Kowiatek B, Opron K, Burton ZF. Kim Y, et al. Int J Mol Sci. 2018 Oct 22;19(10):3275. doi: 10.3390/ijms19103275. Int J Mol Sci. 2018. PMID: 30360357 Free PMC article.

See all "Cited by" articles

References

1. Bourdeau V, Ferbeyre G, Pageau M, Paquin B, Cedergren R. The distribution of RNA motifs in natural sequences. Nucleic Acids Res. 1999;27:4457–4467. - PMC - PubMed
1. Davis JH, Szostak JW. Isolation of high-affinity GTP aptamers from partially structured RNA libraries. Proc Natl Acad Sci. 2002;99:11616–11621. - PMC - PubMed
1. De Sterck H, Zhang C, Papo A. Database-driven grid computing with GridBASE. IEEE International Symposium on Bioinformatics and Life Science Computing (BLSC07), AINAW-07; IEEE Computer Society; Washington, DC. 2007. pp. 696–701.
1. De Sterck H, Papo A, Zhang C, Hamady M, Knight R. Database-driven grid computing and distributed web applications: A comparison. In: Zomaya A, Taibi E-G, editors. Grids for bioinformatics and computational biology. Wiley Interscience; New York: 2008. pp. 247–266.
1. Durrett R. Probability theory and examples. 3rd ed. Duxbury Press; Pacific Grove, CA: 2004.

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Natural and artificial RNAs occupy the same restricted region of sequence space

Affiliation

Natural and artificial RNAs occupy the same restricted region of sequence space

Authors

Affiliation

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Miscellaneous

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Related information

Grants and funding

LinkOut - more resources

Full Text Sources

Miscellaneous