Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Sep 3:16:280.
doi: 10.1186/s12859-015-0706-x.

Multi-objective optimization for RNA design with multiple target secondary structures

Affiliations

Multi-objective optimization for RNA design with multiple target secondary structures

Akito Taneda. BMC Bioinformatics. .

Abstract

Background: RNAs are attractive molecules as the biological parts for synthetic biology. In particular, the ability of conformational changes, which can be encoded in designer RNAs, enables us to create multistable molecular switches that function in biological circuits. Although various algorithms for designing such RNA switches have been proposed, the previous algorithms optimize the RNA sequences against the weighted sum of objective functions, where empirical weights among objective functions are used. In addition, an RNA design algorithm for multiple pseudoknot targets is currently not available.

Results: We developed a novel computational tool for automatically designing RNA sequences which fold into multiple target secondary structures. Our algorithm designs RNA sequences based on multi-objective genetic algorithm, by which we can explore the RNA sequences having good objective function values without empirical weight parameters among the objective functions. Our algorithm has great flexibility by virtue of this weight-free nature. We benchmarked our multi-target RNA design algorithm with the datasets of two, three, and four target structures and found that our algorithm shows better or comparable design performances compared with the previous algorithms, RNAdesign and Frnakenstein. In addition to the benchmarks with pseudoknot-free datasets, we benchmarked MODENA with two-target pseudoknot datasets and found that MODENA can design the RNAs which have the target pseudoknotted secondary structures whose free energies are close to the lowest free energy. Moreover, we applied our algorithm to a ribozyme-based ON-switch which takes a ribozyme-inactive secondary structure when the theophylline aptamer structure is assumed.

Conclusions: Currently, MODENA is the only RNA design software which can be applied to multiple pseudoknot targets. Successful design results for the multiple targets and an RNA device indicate usefulness of our multi-objective RNA design algorithm.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Pseudocode for the genetic algorithm utilized in MODENA for multiple targets l is a loop counter; c is a counter for a stop condition; P is a set of individuals; n rank1, m rank1 and N del are temporary variables
Fig. 2
Fig. 2
An example of a connected component of the dependency graph. a An example of the set of three targets T1, T2, and T3. Nucleotide positions are numbered above the example, e.g, shadowed columns from the left to the right correspond to nucleotide positions 1, 9, 13, 21, 25, 33, 37, 42, respectively. As an example, base paired positions belonging to a connected component are shadowed. The dependency graph is composed of all connected components (not shown except the shadowed one) derived from the targets. b A graph representation of the connected component shadowed in (a). The numbers indicate nucleotide positions. An edge corresponds to a base pair between nucleotide positions. If there is a base pair in one of the target structures, an edge appears in a connected component. E.g., nucleotide positions 21 and 25 in a have a base pair in T3, so that there is an edge between node 21 and 25 in (b). c Decomposition of the connected component into the paths, where a cycle is treated as a path by defining a single vertex as a ‘start and end’ vertex (e.g. the cycle shown at the left of this figure). The start and end vertices are denoted by solid circles
Fig. 3
Fig. 3
An example of the procedure for assigning nucleotide codes to the ‘start and end’ vertices of the connected component shown in Fig. 2. In this example, position 21 is selected as a root vertex v arb. Then we consider the tree structure for assigning nucleotides to the ‘start and end’ vertices, where the order of the ‘start and end’ vertices is obtained by the depth-first search in the spanning tree of the connected component (in this example, the order of positions 21, 25, 37, and 42 from the top to the bottom of the tree). Dashed arrows mean that there exists a decomposed path between two positions. In this example, first, an A or G is randomly selected for position 21 (it is noted that, in the case of the nucleotide assignment for the GA initialization, not an A or G, but an A, C, G or U is randomly selected here). Then nucleotide codes of the remaining positions are assigned from the top of the tree to the bottom. Let us consider an A is selected for position 21. Even in the case such that a C alone is allowed to position 37 due to a sequence constraint, first we try to assign AUU from the top to the bottom; however, since this violates the constraint, we backtrack to position 25 and then assign a C to position 37. As a result, we assign AUCA to the ‘start and end’ vertices in this example
Fig. 4
Fig. 4
Pseudocode of the nucleotide assignment algorithm for a given connected component c. By using this function, we can assign compatible nucleotide codes to the nucleotide positions belonging to c; the output is contained in a one-dimensional array sq. maxcount and b are a counter variable and a temporary one dimensional array for traversing the ‘start and end’ vertices, respectively. In line 6, this function calls traverseStEndVertices() shown in Fig. 5. In line 11, k = 1 and k = length(p) correspond to the start and end vertices, respectively; length(p) is the number of nucleotide positions (including the start and end vertices) of which path p is comprised. In line 13, we assume that χ(k,x,s) can be accessed as a global array. Array indices start at 0. Comments are written in the C language-like format
Fig. 5
Fig. 5
Pseudocode of the nucleotide assignment algorithm (continued from Fig. 4). i is a nucleotide position. sq is a one-dimensional array for containing a nucleotide sequence. b is a one-dimensional array containing nucleotide positions of a connected component. lab = 0, lab = 1, and lab = 2 indicate “nucleotide position i is v arb”, “A and U have a priority”, and “C and G have a priority”, respectively. x and labx are temporary variables. In lines 8 and 14, a nucleotide type (i.e. purine or pyrimidine) can be determined based on the nucleotide type of v arb (which is assigned in line 23 - 29) and the partition, to which position i belongs, of the bipartite graph of the connected component c. In line 22, we assume that the indicator function λ(t,u,i,j) is used to check the compatibility as a global array. In line 31, this function calls itself recursively. By resricting possible nucleotide types to purine or pyrimidine in line 7, this function becomes that for transversion operator. Array indices start at 0. Comments are written in the C language-like format
Fig. 6
Fig. 6
An example of negative design operator. Here we consider a two-target design. In this example, a parent sequence, target structures 1 and 2, a selected predicted structure, and a child sequence are denoted by Parent, Target1, Target2, Predict1, Child, respectively. First, the Parent sequence is copied to the Child sequence; Step 1) we detect undesired base pairs (positions are denoted in red) in Predict1 which do not appear in the target structures; Step 2) we change the nucleotide(s) of each undesired predicted base pair to disrupt the undesired base pairing. Here, the G in Parent is changed to the C in Child (both are denoted by shadowed characters); Step 3) to repair the base complementarity denoted in red, the shadowed C is changed to the shadowed G
Fig. 7
Fig. 7
An example of positive design operator. Here we consider a two-target design. In this example, a parent sequence, target structures 1 and 2, a selected predicted structure, and a child sequence are denoted by Parent, Target1, Target2, Predict1, Child, respectively. First, the Parent sequence is copied to the Child sequence; Step 1) we scan Predict1 from the left to the right to find a target base pair which is missing in Predict1 (denoted by a red base pair); Step 2) we change the shadowed A in Parent to the shadowed G in Child. Then, to repair the base complementarity, the shadowed U is changed to the shadowed C; Step 3) in addition, since the changed position forms a base pair in Target2, the shadowed A is changed to the shadowed G
Fig. 8
Fig. 8
A design example of ribozyme-based RNA device. Gray circles indicate sequence constraints. The original ribozyme consensus sequence motifs and loop nucleotides of L2bulge1 [15] were used as sequence constraints. The structures and free energies shown in this figure were computed by Fold in RNAstructure package. a Ribozyme-active conformation predicted as the MFE structure. b Ribozyme-inactive conformation predicted as the MFE structure when the aptamer domain structure is constrained; the region ranging from 43 to 72 nt is the theophylline aptamer domain. The designed sequence is GCUGUCUCUCUCUGUGCUUGAGGGACUGAUGAGAGUGUACCAAUACCAGCAUCGUCUUGAUGCCCUUGGCAGU GGUAUGGUGAAUUCGAAACAGC. These structures were visualized by VARNA [40]

Similar articles

Cited by

References

    1. Tang J, Breaker RR. Rational design of allosteric ribozymes. Chem Biol. 1997;4:453–9. doi: 10.1016/S1074-5521(97)90197-6. - DOI - PubMed
    1. Win MN, Liang JC, Smolke CD. Frameworks for programming biological function through RNA parts and devices. Chem Biol. 2009;16:298–310. doi: 10.1016/j.chembiol.2009.02.011. - DOI - PMC - PubMed
    1. Rodrigo G, Landrain TE, Jaramillo A. De novo automated design of small RNA circuits for engineering synthetic riboregulation in living cells. Proc Natl Acad Sci U S A. 2012;109:15271–6. doi: 10.1073/pnas.1203831109. - DOI - PMC - PubMed
    1. Wachsmuth M, Findeiß S, Weissheimer N, Stadler PF, Mörl M. De novo design of a synthetic riboswitch that regulates transcription termination. Nucleic Acids Res. 2012;41:2541–551. doi: 10.1093/nar/gks1330. - DOI - PMC - PubMed
    1. Perreault J, Weinberg Z, Roth A, Popescu O, Chartrand P, Ferbeyre G, et al. Identification of hammerhead ribozymes in all domains of life reveals novel structural variations. PLoS Comput Biol. 2011;7:1002031. doi: 10.1371/journal.pcbi.1002031. - DOI - PMC - PubMed

Publication types

LinkOut - more resources