Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2006 Oct 3:7:429.
doi: 10.1186/1471-2105-7-429.

Design of a combinatorial DNA microarray for protein-DNA interaction studies

Affiliations

Design of a combinatorial DNA microarray for protein-DNA interaction studies

Julian Mintseris et al. BMC Bioinformatics. .

Abstract

Background: Discovery of precise specificity of transcription factors is an important step on the way to understanding the complex mechanisms of gene regulation in eukaryotes. Recently, double-stranded protein-binding microarrays were developed as a potentially scalable approach to tackle transcription factor binding site identification.

Results: Here we present an algorithmic approach to experimental design of a microarray that allows for testing full specificity of a transcription factor binding to all possible DNA binding sites of a given length, with optimally efficient use of the array. This design is universal, works for any factor that binds a sequence motif and is not species-specific. Furthermore, simulation results show that data produced with the designed arrays is easier to analyze and would result in more precise identification of binding sites.

Conclusion: In this study, we present a design of a double stranded DNA microarray for protein-DNA interaction studies and show that our algorithm allows optimally efficient use of the arrays for this purpose. We believe such a design will prove useful for transcription factor binding site identification and other biological problems.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Probe design from the shortest path on a graph. The de Bruijn graph for all possible DNA base doublets and one possible solution for a shortest path represented as a pseudo-Eulerian cycle (bold edges). The reverse complement solution is represented by dashed edges in the graph and also the inner cycle sequence. "Cutting" the circular sequence while retaining one overlapping base results in two sequences of total length 12 (containing all doublets) as compared to the length of all non-overlapping concatenated doublets 2 * 42 = 32. Cutting the circular sequence at different points allows screening multiple replicates and helps identify biases in sequence recognition preferences. Reverse complement strands for the replicates are not shown.
Figure 2
Figure 2
Distribution of putative PBM probe hits for Rap1. Frequency of array probe hits distributed by number of potential binding sites per probe. All sequences one or two mutations away from the consensus sequence are assumed to bind.
Figure 3
Figure 3
Distribution of putative PBM probe hits for TBP. Frequency of array probe hits distributed by number of potential binding sites per probe. All sequences one or two mutations away from the consensus sequence are assumed to bind.
Figure 4
Figure 4
Distribution of putative PBM probe hits for 100 random transcription factor binding sites of length 10. Frequency of array probe hits distributed by number of potential binding sites per probe. The data is averaged over 100 random 10-mer binding sites. For each 10-mer, all sequences one or two mutations away from the consensus sequence are assumed to bind.
Figure 5
Figure 5
Robustness of designed array and Gibbs Sampler to addition of noise. Starting with a set of 10-mer Rap1 TRANSFAC binding sites, the effect of added noise is measured as correlation of the original PWM with that derived from 100 Gibbs Sampler-runs. Each level of noise is represented by the standard box-and-whisker plot. In the 0–50% noise range, the boxes are so small that they are essentially represented by a single line.

Similar articles

Cited by

References

    1. Davidson EH, Rast JP, Oliveri P, Ransick A, Calestani C, Yuh CH, Minokawa T, Amore G, Hinman V, Arenas-Mena C, Otim O, Brown CT, Livi CB, Lee PY, Revilla R, Rust AG, Pan Z, Schilstra MJ, Clarke PJ, Arnone MI, Rowen L, Cameron RA, McClay DR, Hood L, Bolouri H. A genomic regulatory network for development. Science. 2002;295:1669–1678. doi: 10.1126/science.1069883. - DOI - PubMed
    1. Bolouri H, Davidson EH. Modeling transcriptional regulatory networks. Bioessays. 2002;24:1118–1129. doi: 10.1002/bies.10189. - DOI - PubMed
    1. Lee TI, Rinaldi NJ, Robert F, Odom DT, Bar-Joseph Z, Gerber GK, Hannett NM, Harbison CT, Thompson CM, Simon I, Zeitlinger J, Jennings EG, Murray HL, Gordon DB, Ren B, Wyrick JJ, Tagne JB, Volkert TL, Fraenkel E, Gifford DK, Young RA. Transcriptional regulatory networks in Saccharomyces cerevisiae. Science. 2002;298:799–804. doi: 10.1126/science.1075090. - DOI - PubMed
    1. Harbison CT, Gordon DB, Lee TI, Rinaldi NJ, Macisaac KD, Danford TW, Hannett NM, Tagne JB, Reynolds DB, Yoo J, Jennings EG, Zeitlinger J, Pokholok DK, Kellis M, Rolfe PA, Takusagawa KT, Lander ES, Gifford DK, Fraenkel E, Young RA. Transcriptional regulatory code of a eukaryotic genome. Nature. 2004;431:99–104. doi: 10.1038/nature02800. - DOI - PMC - PubMed
    1. Bulyk ML, Gentalen E, Lockhart DJ, Church GM. Quantifying DNA-protein interactions by double-stranded DNA arrays. Nat Biotechnol. 1999;17:573–577. doi: 10.1038/9878. - DOI - PubMed

Publication types