Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2004 Aug 27;32(15):4630-45.
doi: 10.1093/nar/gkh802. Print 2004.

Rational design of DNA sequences for nanotechnology, microarrays and molecular computers using Eulerian graphs

Affiliations

Rational design of DNA sequences for nanotechnology, microarrays and molecular computers using Eulerian graphs

Petr Pancoska et al. Nucleic Acids Res. .

Abstract

Nucleic acids are molecules of choice for both established and emerging nanoscale technologies. These technologies benefit from large functional densities of 'DNA processing elements' that can be readily manufactured. To achieve the desired functionality, polynucleotide sequences are currently designed by a process that involves tedious and laborious filtering of potential candidates against a series of requirements and parameters. Here, we present a complete novel methodology for the rapid rational design of large sets of DNA sequences. This method allows for the direct implementation of very complex and detailed requirements for the generated sequences, thus avoiding 'brute force' filtering. At the same time, these sequences have narrow distributions of melting temperatures. The molecular part of the design process can be done without computer assistance, using an efficient 'human engineering' approach by drawing a single blueprint graph that represents all generated sequences. Moreover, the method eliminates the necessity for extensive thermodynamic calculations. Melting temperature can be calculated only once (or not at all). In addition, the isostability of the sequences is independent of the selection of a particular set of thermodynamic parameters. Applications are presented for DNA sequence designs for microarrays, universal microarray zip sequences and electron transfer experiments.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Schematic representation of transformation of a given DNA sequence into Eulerian graph Γnn and its adjacency matrix Ann). (A) Graphical representation of the transformation showing annealing of identically labeled vertices into final form of Γnn. Construction of adjacency matrix Ann) follows established mathematical rules. (B) Correspondence of matrix elements of Ann) and entries in the first term of Equation 1.
Figure 2
Figure 2
Example of construction of adjacency matrix Ann) and corresponding Eulerian graph Γnn using the set of structural requirements defined in the text. (A) Initial form of matrix Ann) after implementation of requirements for length, nucleobase composition, presence of only one ∼TT∼ and ∼CC∼ motif and absence of ∼CA∼ and ∼AG∼ stacking interactions in all generated contextually isostable sequences. (B) First intermediate form of the adjacency matrix Ann) after relations between diagonal and off-diagonal elements were used to fill the C-row and column. (C) Second intermediate form of Ann) with elements aAT and aTG determined by the rules of Equation 3 (see text for details). (D) Final form of the Ann) with remaining off-diagonal elements calculated from the previous intermediate form. (E) Eulerian graph Γnn defined by Ann) (F) Distribution of Tm predicted for contextually nn-isostable sequences generated from the graph E using nnn-level of thermodynamic formulation of Equation 1w ith parameters from (2).
Figure 3
Figure 3
Schematic example of our algorithm that generates sets of contextually isostable DNA sequences from basic cycles γk defined by decomposition of the Eulerian graph Γnn. (A) Given its connectivity, the original graph Γnn is decomposed of into three basic cycle subgraphs γa + 2γb. (B) The first level of building template structures T from γk is demonstrated. All four possible ways of annealing one basic cycle γa to another basic cycle γb via identical vertices are shown (A–D). (C) Completion of the template structures T by annealing the second basic cycle γb to intermediate structure A from level B. Final template A-A1 means that the intermediate structure A is connected to the second γbvia the upper left A-vertex. Final template A-A2 means that the intermediate structure A is connected to the second γb via the lower right A-vertex and so forth. The remaining templates [derived from (B) to (D)] are omitted for clarity. (D) Left panel: The complete set of sequences as read from all final templates, starting at base A. Central panel: Selection of a unique sequence set from the complete list. Reduction in the number of sequences is the consequence of many identities among templates, which, in turn, is a consequence of the symmetry of graph Γnn. This redundancy is only generated due to the ‘manual’ nature of the construction for demonstration purposes. However, in the final software implementation, these redundancies are avoided. Right panel: the final set of contextually isostable sequences generated by systematic permutation of the permutable segments.
Figure 4
Figure 4
Schematic representation of how to generate the set of sequences that are contextually isostable at the next-nearest-neighbor level. The ‘mother’ oligomer ATGACATGATCATCA is converted into template graphs T* by a systematic search for identically oriented 2-, 3- and 4-tuples. If more than one n-tuple is found in the sequence, they are combined into the node vertex of T*. This operation defines permutable segments. Their systematic permutation yields the contextually isostable ‘daughter’ oligomers, shown in the central panel. For comparison, multiple alignment of the resulting sequences using ClustalW is shown.
Figure 5
Figure 5
Design of permutable blocks for the synthesis of contextually isostable zip-code sequences for universal microarrays. (A) Demonstration of the topology of template graph T* (top) and several arrangements of the permutable segments in the final zip-code templates. The permutable segments have different lengths to minimize cross-hybridization that derives from the periodicity of the zip sequences. The different shading of the template vertices demonstrates that contextual isostability is a property derived from the template graph topology, rather than from the concrete nucleobase identities. Vertices in the permutable blocks can be arbitrarily filled with any nucleobase. As long as these vertex assignments remain unchanged throughout the permutation, the resulting set of sequences will be contextually isostable. This increases the overall effectivity of the design: Once the satisfactory (sub)set of contextually isostable zip-sequences is selected, its common thermodynamic properties can be fine-tuned by systematic replacement of the nucleobases in the oligomers. (B) An example of how to adjust the Tm for contextually nn-isostable 37mers, generated from eight permutable segments that are defined by the common template T*. The assignments of nucleobases to vertices in the right template (set 2) were generated by systematic replacements of A→G, T→C, C→A and G→T in the left template (set 1). The resulting block permutation yielded 40 320 sequences. Tm predictions were calculated for these two series using all four selections of node base X = A, T, C and G. See Table 2 for results.
Figure 6
Figure 6
Graph-based design of generalized permutable blocks for the generation of at least 4000 contextually isostable sequences with a given subset of next-nearest-neighbor (nnn-) interactions. This graph facilitates the feasibility analysis for the use of such sequences in n-mer-based universal microarrays for expression profiling (80). See text for details.
Figure 7
Figure 7
Histograms of Tm calculated on nn-level of approximation for sets of probes of 49 yeast ORFs. (A) Affymetrix probe set (935) from yeast S98 microarray. (B) Probes designed by our method with up to three mismatched bases between daughter and target sequences (853). (C) Probes designed with up to two mismatched bases between daughter and target sequences (795). (D) Probes designed with up to one mismatched base between daughter and target sequences (786). (E) Probes designed with no mismatched bases between daughter and target sequences (744).

Similar articles

Cited by

References

    1. Dirks R.M., Lin,M., Winfree,E. and Pierce,N.A. (2004) Paradigms for computational nucleic acid design. Nucleic Acids Res., 32, 1392–1403. - PMC - PubMed
    1. Benight A.S., Pancoska,P., Owczarzy,R., Vallone,P.M., Nesetril,J. and Riccelli,P.V. (2001) Calculating sequence-dependent melting stability of duplex DNA oligomers and multiplex sequence analysis by graphs. Methods Enzymol., 340, 165–192. - PubMed
    1. SantaLucia J. Jr, Kierzek,R. and Turner,D.H. (1991) Stabilities of consecutive A.C, C.C, G.G, U.C, and U.U mismatches in RNA internal loops: evidence for stable hydrogen-bonded U.U and C.C.+ pairs. Biochemistry, 30, 8242–8251. - PubMed
    1. SantaLucia J. Jr, Allawi,H.T. and Seneviratne,P.A. (1996) Improved nearest-neighbor parameters for predicting DNA duplex stability. Biochemistry, 35, 3555–3562. - PubMed
    1. Xia T., SantaLucia,J.,Jr, Burkard,M.E., Kierzek,R., Schroeder,S.J., Jiao,X., Cox,C. and Turner,D.H. (1998) Thermodynamic parameters for an expanded nearest-neighbor model for formation of RNA duplexes with Watson–Crick base pairs. Biochemistry, 37, 14719–14735. - PubMed

Publication types