. 2011 Jan;39(2):e6.

doi: 10.1093/nar/gkq1071. Epub 2010 Nov 4.

Use of structural DNA properties for the prediction of transcription-factor binding sites in Escherichia coli

Pieter Meysman¹, Thanh Hai Dang, Kris Laukens, Riet De Smet, Yan Wu, Kathleen Marchal, Kristof Engelen

Affiliations

PMID: 21051340
PMCID: PMC3025552
DOI: 10.1093/nar/gkq1071

Use of structural DNA properties for the prediction of transcription-factor binding sites in Escherichia coli

Pieter Meysman et al. Nucleic Acids Res. 2011 Jan.

. 2011 Jan;39(2):e6.

doi: 10.1093/nar/gkq1071. Epub 2010 Nov 4.

Authors

Pieter Meysman¹, Thanh Hai Dang, Kris Laukens, Riet De Smet, Yan Wu, Kathleen Marchal, Kristof Engelen

Affiliation

¹ Department of Microbial and Molecular systems, KU Leuven, Leuven Heverlee, Belgium.

PMID: 21051340
PMCID: PMC3025552
DOI: 10.1093/nar/gkq1071

Abstract

Recognition of genomic binding sites by transcription factors can occur through base-specific recognition, or by recognition of variations within the structure of the DNA macromolecule. In this article, we investigate what information can be retrieved from local DNA structural properties that is relevant to transcription factor binding and that cannot be captured by the nucleotide sequence alone. More specifically, we explore the benefit of employing the structural characteristics of DNA to create binding-site models that encompass indirect recognition for the Escherichia coli model organism. We developed a novel methodology [Conditional Random fields of Smoothed Structural Data (CRoSSeD)], based on structural scales and conditional random fields to model and predict regulator binding sites. The value of relying on local structural-DNA properties is demonstrated by improved classifier performance on a large number of biological datasets, and by the detection of novel binding sites which could be validated by independent data sources, and which could not be identified using sequence data alone. We further show that the CRoSSeD-binding-site models can be related to the actual molecular mechanisms of the transcription factor DNA binding, and thus cannot only be used for prediction of novel sites, but might also give valuable insights into unknown binding mechanisms of transcription factors.

PubMed Disclaimer

Figures

**Figure 1.**
Overview of the CRoSSeD methodology. The sequence of known TF-binding sites (green) are collected and used to create different structural profiles by applying structural scales. These structural scales are then used as input for the CRoSSeD model which will create a binding site model featuring strongly conserved structural profile characteristics in specific regions at the binding sites. These binding site models can then be used to predict other binding sites (red) for the given TF in the genome.

**Figure 2.**
(a) Flexibility profiles of all 40 positive synthetic samples (blue lines) as measured by the B-DNA twist scale, (lower values correspond to more flexible regions). The red line is the average profile. For comparison, the sequence conservation logo is also given for each position. At the bottom of the figure is the structural characteristic that was simulated (HR: high rigidity, LR: low rigidity). (b) ROC curve displaying the average result of five 10-fold cross validations for the CRoSSeD (blue line), BioBayesNet (green line), PWM (red line) and CRFseq (cyan line) model when applied to the synthetic data set.

**Figure 3.**
Performance results of the different methods on the CRP (a) and PurR (b) data sets. The ROC curves display the trade-off between the sensitivity (the fraction of positive samples correctly identified as binding sites) and specificity (the fraction of incorrectly identified negative samples) of the results on the left out samples obtained at different probability thresholds for five 10-fold cross validations for the CRoSSeD (blue), CRFseq (cyan line), the PWM (red) and BioBayesNet model (green).

**Figure 4.**
Overview of the co-expressed gene sets and their enrichment with high-scoring predicted binding sites obtained from respectively the structure- or sequence-based models. The pie chart represents all found co-expressed gene sets, divided into segments with no significant high-ranking predictions enrichment (blue) and gene sets that were found to be enriched and if this was for the binding sites predicted by the CRoSSeD (red), PWM (green) or both models (purple). A table is provided per segment, listing the related TF (in bold) and the most relevant significantly enriched gene function in the gene sets.

**Figure 5.**
Representation of high-scoring binding site predictions enrichment in co-expressed gene sets for respectively ArgR (a), SoxS (b) and PurR (c). Each plot corresponds the entire ranked gene list as obtained from the screening using the PWM (red) and CRoSSeD (green) motif models with decreasing confidence from left to right. Marked are the positions of the genes that were found co-expressed with the known target genes of the respective TFs.

**Figure 6.**
Important features contributing to the CRoSSeD model for, respectively, CRP (a) and PurR (b). In panel (a), the profile corresponds to the DNase-I cutting frequency (flexibility) profile based on the weights assigned to the CRP model. Plotted in the dark blue line is the weighted average of the property at each position in the motif and surrounding it in the light-blue area is the standard deviation on this average for each position. Panel (b) contains the disruption energy profile (stability) based on the PurR model.

See this image and copyright information in PMC

Cited by

A flexible integrative approach based on random forest improves prediction of transcription factor binding sites.
Hooghe B, Broos S, van Roy F, De Bleser P. Hooghe B, et al. Nucleic Acids Res. 2012 Aug;40(14):e106. doi: 10.1093/nar/gks283. Epub 2012 Apr 5. Nucleic Acids Res. 2012. PMID: 22492513 Free PMC article.
Deconvolving the recognition of DNA shape from sequence.
Abe N, Dror I, Yang L, Slattery M, Zhou T, Bussemaker HJ, Rohs R, Mann RS. Abe N, et al. Cell. 2015 Apr 9;161(2):307-18. doi: 10.1016/j.cell.2015.02.008. Epub 2015 Apr 2. Cell. 2015. PMID: 25843630 Free PMC article.
Improved predictions of transcription factor binding sites using physicochemical features of DNA.
Maienschein-Cline M, Dinner AR, Hlavacek WS, Mu F. Maienschein-Cline M, et al. Nucleic Acids Res. 2012 Dec;40(22):e175. doi: 10.1093/nar/gks771. Epub 2012 Aug 25. Nucleic Acids Res. 2012. PMID: 22923524 Free PMC article.
Expression divergence between Escherichia coli and Salmonella enterica serovar Typhimurium reflects their lifestyles.
Meysman P, Sánchez-Rodríguez A, Fu Q, Marchal K, Engelen K. Meysman P, et al. Mol Biol Evol. 2013 Jun;30(6):1302-14. doi: 10.1093/molbev/mst029. Epub 2013 Feb 20. Mol Biol Evol. 2013. PMID: 23427276 Free PMC article.
An improved systematic approach to predicting transcription factor target genes using support vector machine.
Cui S, Youn E, Lee J, Maas SJ. Cui S, et al. PLoS One. 2014 Apr 17;9(4):e94519. doi: 10.1371/journal.pone.0094519. eCollection 2014. PLoS One. 2014. PMID: 24743548 Free PMC article.

See all "Cited by" articles

References

1. Stormo GD. DNA binding sites: representation and discovery. Bioinformatics. 2000;16:16–23. - PubMed
1. Gromiha MM, Siebers JG, Selvaraj S, Kono H, Sarai A. Role of inter and intramolecular interactions in protein-DNA recognition. Gene. 2005;364:108–113. - PubMed
1. Kono H, Sarai A. Structure-based prediction of DNA target sites by regulatory proteins. Proteins. 1999;35:114–131. - PubMed
1. Morozov AV, Havranek JJ, Baker D, Siggia ED. Protein-DNA binding specificity predictions with structural models. Nucleic Acids Res. 2005;33:5781–5798. - PMC - PubMed
1. Angarica VE, Perez AG, Vasconcelos AT, Collado-Vides J, Contreras-Moreira B. Prediction of TF target sites based on atomistic models of protein-DNA complexes. BMC Bioinformatics. 2008;9:436. - PMC - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Use of structural DNA properties for the prediction of transcription-factor binding sites in Escherichia coli

Affiliation

Use of structural DNA properties for the prediction of transcription-factor binding sites in Escherichia coli

Authors

Affiliation

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources