Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Feb;16(2):280-9.
doi: 10.1261/rna.1923210. Epub 2009 Dec 23.

Natural and artificial RNAs occupy the same restricted region of sequence space

Affiliations

Natural and artificial RNAs occupy the same restricted region of sequence space

Ryan Kennedy et al. RNA. 2010 Feb.

Abstract

Different chemical and mutational processes within genomes give rise to sequences with different compositions and perhaps different capacities for evolution. The evolution of functional RNAs may occur on a "neutral network" in which sequences with any given function can easily mutate to sequences with any other. This neutral network hypothesis is more likely if there is a particular region of composition that contains sequences that are functional in general, and if many different functions are possible within this preferred region of composition. We show that sequence preferences in active sites recovered by in vitro selection combine with biophysical folding rules to support the neutral network hypothesis. These simple active-site specifications and folding preferences obtained by artificial selection experiments recapture the previously observed purine bias and specific spread along the GC axis of naturally occurring aptamers and ribozymes isolated from organisms, although other types of RNAs, such as miRNA precursors and spliceosomal RNAs, that act primarily through complementarity to other amino acids do not share these preferences. These universal evolved sequence features are therefore intrinsic in RNA molecules that bind small-molecule targets or catalyze reactions.

PubMed Disclaimer

Figures

FIGURE 1.
FIGURE 1.
Striking similarities between distributions of (a) natural aptamers (green) and ribozymes (blue), and (b) artificial aptamers and ribozymes colored by function (nucleotide binding, red; antibiotic binding, blue; amino acid binding, yellow; self-cleaving ribozyme, gray; other, green); (c) the superposition of the two. Results for artificial sequences shown here and in Figure 2 are for 100-nt sequences; sequence length had little effect (Fig. 7). Summing the individual motif probabilities, rather than calculating motif overlaps, gave similar results (Fig. 8).
FIGURE 2.
FIGURE 2.
Separate components of active-site composition and folding preferences. (Left) Active-site sequence requirements; (middle) folding; (right) combined. Although more of the effect comes from the active-site requirements than from folding, the effects of folding shift the overall position of the distribution.
FIGURE 3.
FIGURE 3.
miRNA precursors and guide RNAs (spliceosomal and snRNAs, whose functionality is governed by complementarity to a target) do not follow the same compositional distribution as do RNAs that are themselves functional (i.e., aptamers and ribozymes). Only natural ribozymes and aptamers (riboswitches) follow the patterns shown by the high-probability regions of the artificially selected motifs.
FIGURE 4.
FIGURE 4.
Overall workflow. (a) Motifs were identified from sequences in the literature. (b) We included all motifs where both a secondary structure diagram and a multiple sequence alignment of the corresponding sequences were available to us. We used RNAfold to predict the folding of the sequences corresponding to each motif, and excluded motifs where none of the sequences for that motif folded into a secondary structure compatible with the published secondary structure diagram (four of 33 motifs examined overall). (c) For each location in sequence space where the frequencies of each nucleotide were an even multiple of 5% (e.g., 55% A, 15% C, 20% A, 10% U), we calculated the probability of each motif using the new upper-bound method (see Materials and Methods). (d) At the same locations, we also calculated the conditional probability of folding correctly, given that the motif was present, by sampling 10,000 sequences drawn from the distribution of sequences containing the motif, folding each sequence with RNAfold, and calculating the fraction of sequences for which the calculated minimum free energy structure was compatible with the motif. (e) Finally, we multiplied these two probabilities together to obtain the joint probability that a randomly chosen sequence of a given length and composition both contains the sequence elements required for the motif and folds correctly. We repeated this procedure for each of the 969 5% interior composition intervals in the space of possible compositions (i.e., compositions that have at least 5% of each base and an even multiple of 5% of all bases). (f) We then modeled the probability distribution of each motif as a multivariate normal distribution, showing ellipsoids at 1 standard deviation from the mean. Superimposing all these ellipsoids allowed us to determine the regions at which each function, or combination of functions, was most likely to occur. (g) Finally, we downloaded biological aptamer and ribozyme sequences from Rfam, plotted their compositions (so that each point corresponds to an individual aptamer or ribozyme sequence), and superimposed them on the distribution of artificial motifs.
FIGURE 5.
FIGURE 5.
Fit between exact and upper-bound calculations. Red points indicate conditions that failed inclusion criteria (i.e., probability of an individual module >0.01, or probability over all modules >0.001: these criteria were set such that all examined motifs were included). The same motifs were used for both sets of calculations, so the graphs are nearly identical. Correlations and relative errors are as follows. Upper-Bound: r2 = 0.998, r2 for filtered points only = 0.999999, mean relative error = 12.9, mean filtered relative error = 0.0093. Poisson: r2 = 0.997, r2 filtered = 0.999999, mean relative error = 12.8, mean filtered relative error = 0.00076. For numerical stability we approximated 1 − e−x by its second-order Taylor series when 0 < x <10−8. Thus the two methods perform similarly and provide excellent agreement with exact calculations over the range of motifs examined.
FIGURE 6.
FIGURE 6.
GridBASE deployment diagram. Rectangular boxes represent different machines. The thick solid lines represent connections to the database. The thin solid lines represent direct control interactions initiated by the operator component. The dashed lines represent notification of workers by their associated brokers (multiple brokers may be employed, for instance, to handle firewall restrictions).
FIGURE 7.
FIGURE 7.
Effects of sequence length and GU base pairs on abundance. Varying the sequence length from 50 to 150 bases and keeping or omitting GU base pairs had little effect on compositional preferences, except that some motifs were unable to fold without GU pairs and others were unable to fit into the shorter sequence lengths (50, 100, and 150 base sequences with GU pairs; 50 or 100 base sequences as needed to contain the motif without GU pairs).
FIGURE 8.
FIGURE 8.
Summing the probabilities across motifs provides results similar to examining motif overlap. Brown points have radii proportional to the sum of probabilities of any motif: compare to Figure 1.

Similar articles

Cited by

References

    1. Bourdeau V, Ferbeyre G, Pageau M, Paquin B, Cedergren R. The distribution of RNA motifs in natural sequences. Nucleic Acids Res. 1999;27:4457–4467. - PMC - PubMed
    1. Davis JH, Szostak JW. Isolation of high-affinity GTP aptamers from partially structured RNA libraries. Proc Natl Acad Sci. 2002;99:11616–11621. - PMC - PubMed
    1. De Sterck H, Zhang C, Papo A. Database-driven grid computing with GridBASE. IEEE International Symposium on Bioinformatics and Life Science Computing (BLSC07), AINAW-07; IEEE Computer Society; Washington, DC. 2007. pp. 696–701.
    1. De Sterck H, Papo A, Zhang C, Hamady M, Knight R. Database-driven grid computing and distributed web applications: A comparison. In: Zomaya A, Taibi E-G, editors. Grids for bioinformatics and computational biology. Wiley Interscience; New York: 2008. pp. 247–266.
    1. Durrett R. Probability theory and examples. 3rd ed. Duxbury Press; Pacific Grove, CA: 2004.

Publication types

MeSH terms

LinkOut - more resources