Rational design of DNA sequences for nanotechnology, microarrays and molecular computers using Eulerian graphs

doi:10.1093/nar/gkh802

. 2004 Aug 27;32(15):4630-45.

doi: 10.1093/nar/gkh802. Print 2004.

Rational design of DNA sequences for nanotechnology, microarrays and molecular computers using Eulerian graphs

Petr Pancoska¹, Zdenek Moravek, Ute M Moll

Affiliations

PMID: 15333695
PMCID: PMC516071
DOI: 10.1093/nar/gkh802

Rational design of DNA sequences for nanotechnology, microarrays and molecular computers using Eulerian graphs

Petr Pancoska et al. Nucleic Acids Res. 2004.

. 2004 Aug 27;32(15):4630-45.

doi: 10.1093/nar/gkh802. Print 2004.

Authors

Petr Pancoska¹, Zdenek Moravek, Ute M Moll

Affiliation

¹ Department of Pathology, Stony Brook University, New York, NY 11794, USA. ppancoska@notes.cc.sunysb.edu

PMID: 15333695
PMCID: PMC516071
DOI: 10.1093/nar/gkh802

Abstract

Nucleic acids are molecules of choice for both established and emerging nanoscale technologies. These technologies benefit from large functional densities of 'DNA processing elements' that can be readily manufactured. To achieve the desired functionality, polynucleotide sequences are currently designed by a process that involves tedious and laborious filtering of potential candidates against a series of requirements and parameters. Here, we present a complete novel methodology for the rapid rational design of large sets of DNA sequences. This method allows for the direct implementation of very complex and detailed requirements for the generated sequences, thus avoiding 'brute force' filtering. At the same time, these sequences have narrow distributions of melting temperatures. The molecular part of the design process can be done without computer assistance, using an efficient 'human engineering' approach by drawing a single blueprint graph that represents all generated sequences. Moreover, the method eliminates the necessity for extensive thermodynamic calculations. Melting temperature can be calculated only once (or not at all). In addition, the isostability of the sequences is independent of the selection of a particular set of thermodynamic parameters. Applications are presented for DNA sequence designs for microarrays, universal microarray zip sequences and electron transfer experiments.

PubMed Disclaimer

Figures

**Figure 1**
Schematic representation of transformation of a given DNA sequence into Eulerian graph Γ*_nn* and its adjacency matrix A(Γ*_nn*). (A) Graphical representation of the transformation showing annealing of identically labeled vertices into final form of Γ*_nn*. Construction of adjacency matrix A(Γ*_nn*) follows established mathematical rules. (B) Correspondence of matrix elements of A(Γ*_nn*) and entries in the first term of Equation 1.

**Figure 2**
Example of construction of adjacency matrix A(Γ*_nn*) and corresponding Eulerian graph Γ*_nn* using the set of structural requirements defined in the text. (A) Initial form of matrix A(Γ*_nn*) after implementation of requirements for length, nucleobase composition, presence of only one ∼TT∼ and ∼CC∼ motif and absence of ∼CA∼ and ∼AG∼ stacking interactions in all generated contextually isostable sequences. (B) First intermediate form of the adjacency matrix A(Γ*_nn*) after relations between diagonal and off-diagonal elements were used to fill the C-row and column. (C) Second intermediate form of A(Γ*_nn*) with elements a_AT and a_TG determined by the rules of Equation 3 (see text for details). (D) Final form of the A(Γ*_nn*) with remaining off-diagonal elements calculated from the previous intermediate form. (E) Eulerian graph Γ*_nn* defined by A(Γ*_nn*) (F) Distribution of T_m predicted for contextually nn-isostable sequences generated from the graph E using *nnn*-level of thermodynamic formulation of Equation 1w ith parameters from (2).

**Figure 3**
Schematic example of our algorithm that generates sets of contextually isostable DNA sequences from basic cycles γ_k defined by decomposition of the Eulerian graph Γ*_nn*. (A) Given its connectivity, the original graph Γ*_nn* is decomposed of into three basic cycle subgraphs γ_a + 2γ_b. (B) The first level of building template structures T from γ_k is demonstrated. All four possible ways of annealing one basic cycle γ_a to another basic cycle γ_b via identical vertices are shown (A–D). (C) Completion of the template structures T by annealing the second basic cycle γ_b to intermediate structure A from level B. Final template A-A1 means that the intermediate structure A is connected to the second γ_bvia the upper left A-vertex. Final template A-A2 means that the intermediate structure A is connected to the second γ_b via the lower right A-vertex and so forth. The remaining templates [derived from (B) to (D)] are omitted for clarity. (D) Left panel: The complete set of sequences as read from all final templates, starting at base A. Central panel: Selection of a unique sequence set from the complete list. Reduction in the number of sequences is the consequence of many identities among templates, which, in turn, is a consequence of the symmetry of graph Γ*_nn*. This redundancy is only generated due to the ‘manual’ nature of the construction for demonstration purposes. However, in the final software implementation, these redundancies are avoided. Right panel: the final set of contextually isostable sequences generated by systematic permutation of the permutable segments.

**Figure 4**
Schematic representation of how to generate the set of sequences that are contextually isostable at the next-nearest-neighbor level. The ‘mother’ oligomer ATGACATGATCATCA is converted into template graphs T^* by a systematic search for identically oriented 2-, 3- and 4-tuples. If more than one n-tuple is found in the sequence, they are combined into the node vertex of T^*. This operation defines permutable segments. Their systematic permutation yields the contextually isostable ‘daughter’ oligomers, shown in the central panel. For comparison, multiple alignment of the resulting sequences using ClustalW is shown.

**Figure 5**
Design of permutable blocks for the synthesis of contextually isostable zip-code sequences for universal microarrays. (A) Demonstration of the topology of template graph T^* (top) and several arrangements of the permutable segments in the final zip-code templates. The permutable segments have different lengths to minimize cross-hybridization that derives from the periodicity of the zip sequences. The different shading of the template vertices demonstrates that contextual isostability is a property derived from the template graph topology, rather than from the concrete nucleobase identities. Vertices in the permutable blocks can be arbitrarily filled with any nucleobase. As long as these vertex assignments remain unchanged throughout the permutation, the resulting set of sequences will be contextually isostable. This increases the overall effectivity of the design: Once the satisfactory (sub)set of contextually isostable zip-sequences is selected, its common thermodynamic properties can be fine-tuned by systematic replacement of the nucleobases in the oligomers. (B) An example of how to adjust the T_m for contextually nn-isostable 37mers, generated from eight permutable segments that are defined by the common template T^*. The assignments of nucleobases to vertices in the right template (set 2) were generated by systematic replacements of A→G, T→C, C→A and G→T in the left template (set 1). The resulting block permutation yielded 40 320 sequences. T_m predictions were calculated for these two series using all four selections of node base X = A, T, C and G. See Table 2 for results.

**Figure 6**
Graph-based design of generalized permutable blocks for the generation of at least 4000 contextually isostable sequences with a given subset of next-nearest-neighbor (*nnn*-) interactions. This graph facilitates the feasibility analysis for the use of such sequences in n-mer-based universal microarrays for expression profiling (80). See text for details.

**Figure 7**
Histograms of T_m calculated on nn-level of approximation for sets of probes of 49 yeast ORFs. (A) Affymetrix probe set (935) from yeast S98 microarray. (B) Probes designed by our method with up to three mismatched bases between daughter and target sequences (853). (C) Probes designed with up to two mismatched bases between daughter and target sequences (795). (D) Probes designed with up to one mismatched base between daughter and target sequences (786). (E) Probes designed with no mismatched bases between daughter and target sequences (744).

See this image and copyright information in PMC

Cited by

Molecular and contextual markers of hepatitis C virus and drug abuse.
Shapshak P, Somboonwit C, Drumright LN, Frost SD, Commins D, Tellinghuisen TL, Scott WK, Duncan R, McCoy C, Page JB, Giunta B, Fernandez F, Singer E, Levine A, Minagar A, Oluwadara O, Kotila T, Chiappelli F, Sinnott JT. Shapshak P, et al. Mol Diagn Ther. 2009;13(3):153-79. doi: 10.2165/01250444-200913030-00002. Mol Diagn Ther. 2009. PMID: 19650670 Free PMC article. Review.
The influence of locked nucleic acid residues on the thermodynamic properties of 2'-O-methyl RNA/RNA heteroduplexes.
Kierzek E, Ciesielska A, Pasternak K, Mathews DH, Turner DH, Kierzek R. Kierzek E, et al. Nucleic Acids Res. 2005 Sep 9;33(16):5082-93. doi: 10.1093/nar/gki789. Print 2005. Nucleic Acids Res. 2005. PMID: 16155181 Free PMC article.

References

1. Dirks R.M., Lin,M., Winfree,E. and Pierce,N.A. (2004) Paradigms for computational nucleic acid design. Nucleic Acids Res., 32, 1392–1403. - PMC - PubMed
1. Benight A.S., Pancoska,P., Owczarzy,R., Vallone,P.M., Nesetril,J. and Riccelli,P.V. (2001) Calculating sequence-dependent melting stability of duplex DNA oligomers and multiplex sequence analysis by graphs. Methods Enzymol., 340, 165–192. - PubMed
1. SantaLucia J. Jr, Kierzek,R. and Turner,D.H. (1991) Stabilities of consecutive A.C, C.C, G.G, U.C, and U.U mismatches in RNA internal loops: evidence for stable hydrogen-bonded U.U and C.C.+ pairs. Biochemistry, 30, 8242–8251. - PubMed
1. SantaLucia J. Jr, Allawi,H.T. and Seneviratne,P.A. (1996) Improved nearest-neighbor parameters for predicting DNA duplex stability. Biochemistry, 35, 3555–3562. - PubMed
1. Xia T., SantaLucia,J.,Jr, Burkard,M.E., Kierzek,R., Schroeder,S.J., Jiao,X., Cox,C. and Turner,D.H. (1998) Thermodynamic parameters for an expanded nearest-neighbor model for formation of RNA duplexes with Watson–Crick base pairs. Biochemistry, 37, 14719–14735. - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database

[1] Dirks R.M., Lin,M., Winfree,E. and Pierce,N.A. (2004) Paradigms for computational nucleic acid design. Nucleic Acids Res., 32, 1392–1403. - PMC - PubMed

[2] Dirks R.M., Lin,M., Winfree,E. and Pierce,N.A. (2004) Paradigms for computational nucleic acid design. Nucleic Acids Res., 32, 1392–1403. - PMC - PubMed

[3] Benight A.S., Pancoska,P., Owczarzy,R., Vallone,P.M., Nesetril,J. and Riccelli,P.V. (2001) Calculating sequence-dependent melting stability of duplex DNA oligomers and multiplex sequence analysis by graphs. Methods Enzymol., 340, 165–192. - PubMed

[4] Benight A.S., Pancoska,P., Owczarzy,R., Vallone,P.M., Nesetril,J. and Riccelli,P.V. (2001) Calculating sequence-dependent melting stability of duplex DNA oligomers and multiplex sequence analysis by graphs. Methods Enzymol., 340, 165–192. - PubMed

[5] SantaLucia J. Jr, Kierzek,R. and Turner,D.H. (1991) Stabilities of consecutive A.C, C.C, G.G, U.C, and U.U mismatches in RNA internal loops: evidence for stable hydrogen-bonded U.U and C.C.+ pairs. Biochemistry, 30, 8242–8251. - PubMed

[6] SantaLucia J. Jr, Kierzek,R. and Turner,D.H. (1991) Stabilities of consecutive A.C, C.C, G.G, U.C, and U.U mismatches in RNA internal loops: evidence for stable hydrogen-bonded U.U and C.C.+ pairs. Biochemistry, 30, 8242–8251. - PubMed

[7] SantaLucia J. Jr, Allawi,H.T. and Seneviratne,P.A. (1996) Improved nearest-neighbor parameters for predicting DNA duplex stability. Biochemistry, 35, 3555–3562. - PubMed

[8] SantaLucia J. Jr, Allawi,H.T. and Seneviratne,P.A. (1996) Improved nearest-neighbor parameters for predicting DNA duplex stability. Biochemistry, 35, 3555–3562. - PubMed

[9] Xia T., SantaLucia,J.,Jr, Burkard,M.E., Kierzek,R., Schroeder,S.J., Jiao,X., Cox,C. and Turner,D.H. (1998) Thermodynamic parameters for an expanded nearest-neighbor model for formation of RNA duplexes with Watson–Crick base pairs. Biochemistry, 37, 14719–14735. - PubMed

[10] Xia T., SantaLucia,J.,Jr, Burkard,M.E., Kierzek,R., Schroeder,S.J., Jiao,X., Cox,C. and Turner,D.H. (1998) Thermodynamic parameters for an expanded nearest-neighbor model for formation of RNA duplexes with Watson–Crick base pairs. Biochemistry, 37, 14719–14735. - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Rational design of DNA sequences for nanotechnology, microarrays and molecular computers using Eulerian graphs

Affiliation

Rational design of DNA sequences for nanotechnology, microarrays and molecular computers using Eulerian graphs

Authors

Affiliation

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Related information

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources