Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Mar 17;26(6):1671.
doi: 10.3390/molecules26061671.

Secondary Structure Libraries for Artificial Evolution Experiments

Affiliations

Secondary Structure Libraries for Artificial Evolution Experiments

Ráchel Sgallová et al. Molecules. .

Abstract

Methods of artificial evolution such as SELEX and in vitro selection have made it possible to isolate RNA and DNA motifs with a wide range of functions from large random sequence libraries. Once the primary sequence of a functional motif is known, the sequence space around it can be comprehensively explored using a combination of random mutagenesis and selection. However, methods to explore the sequence space of a secondary structure are not as well characterized. Here we address this question by describing a method to construct libraries in a single synthesis which are enriched for sequences with the potential to form a specific secondary structure, such as that of an aptamer, ribozyme, or deoxyribozyme. Although interactions such as base pairs cannot be encoded in a library using conventional DNA synthesizers, it is possible to modulate the probability that two positions will have the potential to pair by biasing the nucleotide composition at these positions. Here we show how to maximize this probability for each of the possible ways to encode a pair (in this study defined as A-U or U-A or C-G or G-C or G.U or U.G). We then use these optimized coding schemes to calculate the number of different variants of model stems and secondary structures expected to occur in a library for a series of structures in which the number of pairs and the extent of conservation of unpaired positions is systematically varied. Our calculations reveal a tradeoff between maximizing the probability of forming a pair and maximizing the number of possible variants of a desired secondary structure that can occur in the library. They also indicate that the optimal coding strategy for a library depends on the complexity of the motif being characterized. Because this approach provides a simple way to generate libraries enriched for sequences with the potential to form a specific secondary structure, we anticipate that it should be useful for the optimization and structural characterization of functional nucleic acid motifs.

Keywords: DNA; RNA; SELEX; aptamer; artificial evolution; deoxyribozyme; in vitro selection; nucleic acids; ribozyme; secondary structure; synthetic biology.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
Concept and design of a secondary structure library. (A) Typical workflow to identify and optimize a functional nucleic acid motif. The starting library usually contains ~1015 random sequences flanked by primer binding sites. After identifying functional motifs by selection, a second library is prepared by randomly mutagenizing a single sequence corresponding to one of the most active variants at a rate of 15% to 25% per position. Additional rounds of selection are performed to identify active variants of this sequence, most of which will adopt the same fold. Information from these variants can be used to design a secondary structure library, which is the topic of this paper. (B) Design of a secondary structure library. In this hypothetical example, variants 1–3 are three variants of a functional RNA motif from the “active variants” step of the workflow in panel a. A secondary structure library combining information from these three variants is shown on the right. Nucleotides that differ from variant 1 are shown in purple. X1-X2 = A-U, U-A, C-G, G-C, G.U, or U.G; R = A or G; W = A or U; Y = C or U; K = G or U.
Figure 2
Figure 2
Encoding base pairs with degenerate positions. (A) The ten possible architectures for encoding base pairs by solid-phase synthesis. The number of possible nucleotides at each position in the base pair in each architecture is shown on the left, and an example is shown on the right. (B) Tradeoff between the number of possible pairs that can be encoded in each of the ten architectures (x axis) and the maximum probability of forming a pair in the architecture (y axis). Y = C or U; R = A or G; K = G or U; B = C, G, or U; D = A, G, or U; N = A, C, G, or U.
Figure 3
Figure 3
Encoding stems with degenerate positions. (A) A hypothetical stem made up of 20 base pairs. The sequence is arbitrary and does not affect the calculations in this panel. (B) Number of variants of this stem (including canonical A-U, U-A, C-G, and G-C base pairs as well as G.U and U.G wobble pairs) at various mutational distances from the starting sequence in a library in which base pairs are encoded by N-N. The total number of possible stem variants is 620 = 3.7 × 1015. (C) Probability distribution of sequences in a library based on this stem in which base pairs are encoded by N-N (N = A, C, G, or U; probability of forming a pair = 0.375). The y axis indicates the probability that a sequence in the library will have the potential to form each of the 20 base pairs in the stem. (D) Same as panel B, but for a library in which base pairs are encoded by R-Y (R = A or G; Y = C or U; probability of forming a pair = 0.75). Because only three of the six possible pairs can occur with this coding scheme, the number of possible stem variants is 320 = 3.5 × 109. (E) Same as panel C, but for a library in which base pairs are encoded by R-Y.
Figure 4
Figure 4
Maximizing the number of unique sequences that form a specific stem in libraries of 1015 sequences. The graphs showing the relationship between the number of base pairs in a stem, the number of possible variants of the stem for the indicated coding scheme (blue curves), and the expected number of variants in a library of 1015 sequences with the potential to form all of the pairs in the stem (green curves) for the indicated coding scheme. The number of unique variants in the library at each point on the x axis is indicated by the curve with the lower value, and the average copy number of library members is greater than one to the left of each intersection point and less than one to the right of each intersection point. (A) Coding scheme in which 6 pairs can occur. An example is N (A, C, G, or U) and N. The probability of forming a pair is 0.375. (B) Coding scheme in which 5 pairs can occur. An example is D (A, G, or U) and N (A, C, G, or U). The probability of forming a pair is 0.417. (C) Coding scheme in which 4 pairs can occur. An example is K (G or U) and N (A, C, G, or U). The probability of forming a pair is 0.5. (D) Coding scheme in which 3 pairs can occur. An example is R (A or G) and Y (C or U). The probability of forming a pair is 0.75. (E) Coding scheme in which 2 pairs can occur. An example is G and Y (C or U). The probability of forming a pair is 1. (F) Coding scheme in which 1 pair can occur. An example is G and C. The probability of forming a base pair is 1.
Figure 5
Figure 5
Enrichment of stem variants in secondary structure libraries relative to randomly mutagenized libraries. The optimal coding strategy for base pairs and the optimal rate of random mutagenesis was determined for a series of stems containing 10 to 50 base pairs. Enrichment of distinct variants of the stem in the secondary structure library (y axis) was calculated by dividing the number of different variants of the stem expected to occur in a secondary structure library (generated using the optimal coding strategy for base pairs) by the number expected to occur in a randomly mutagenized library (generated using the optimal rate of mutagenesis). Calculations were performed for a library of 1015 sequences. The breakpoints in this graph are due to changes in the maximum number of mutations a sequence can contain to be present in the library.
Figure 6
Figure 6
Synthesis of secondary structure libraries using a split-and-pool approach. In this example, a library containing all possible variants of a stem is constructed by synthesizing eight different oligonucleotides in which base pairs are encoded by different combinations of R-Y and Y-R. These oligonucleotides are mixed to generate the final library containing 512 different sequences, including each of the 216 possible stem sequences. Z1-Z2 = R-Y or Y-R.
Figure 7
Figure 7
Secondary structure libraries based on known motifs for library sizes of 1015 sequences. (A) Expected number of unique variants with the potential to form the secondary structure of a DNA aptamer that binds streptavidin [22] in a library of 1015 sequences using different coding strategies to encode base pairs. The column labeled “Ran” indicates the number for a library generated at the optimal rate of random mutagenesis using the method described in Section 4.3. (B) Possible secondary structure library for this motif. (C,D), the same, but for an RNA aptamer that binds ATP [23,24,25]. (E,F). the same, but for a kinase ribozyme that thiophosphorylates itself using GTPγS as a substrate [19,20]. Y = C or T (U); R = A or G; K = G or T (U); W = A or T (U); S = C or G; D = A, G, or T (U); H = A, C, or T (U); V = A, C, or G; N = A, C, G, or T (U).
Figure 8
Figure 8
Relationship between the complexity of the secondary structure and the optimal coding scheme for base pairs in secondary structure libraries. Y = C or U (T); R = A or G; K = G or U (T); D = A, G, or U (T); N = A, C, G, or U (T).

References

    1. Wilson D.S., Szostak J.W. In vitro selection of functional nucleic acids. Annu. Rev. Biochem. 1999;68:611–647. doi: 10.1146/annurev.biochem.68.1.611. - DOI - PubMed
    1. Bartel D.P., Unrau P.J. Constructing an RNA world. Trends Cell Biol. 1999;9:M9–M13. doi: 10.1016/S0962-8924(99)01669-4. - DOI - PubMed
    1. Breaker R.R. Natural and engineered nucleic acids as tools to explore biology. Nature. 2004;432:838–845. doi: 10.1038/nature03195. - DOI - PubMed
    1. Joyce G.F. Directed evolution of nucleic acid enzymes. Annu. Rev. Biochem. 2004;73:791–836. doi: 10.1146/annurev.biochem.73.011303.073717. - DOI - PubMed
    1. Silverman S.K. Catalytic DNA: Scope, applications, and biochemistry of deoxyribozymes. Trends Biochem. Sci. 2016;41:595–609. doi: 10.1016/j.tibs.2016.04.010. - DOI - PMC - PubMed