Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Jan;39(2):e6.
doi: 10.1093/nar/gkq1071. Epub 2010 Nov 4.

Use of structural DNA properties for the prediction of transcription-factor binding sites in Escherichia coli

Affiliations

Use of structural DNA properties for the prediction of transcription-factor binding sites in Escherichia coli

Pieter Meysman et al. Nucleic Acids Res. 2011 Jan.

Abstract

Recognition of genomic binding sites by transcription factors can occur through base-specific recognition, or by recognition of variations within the structure of the DNA macromolecule. In this article, we investigate what information can be retrieved from local DNA structural properties that is relevant to transcription factor binding and that cannot be captured by the nucleotide sequence alone. More specifically, we explore the benefit of employing the structural characteristics of DNA to create binding-site models that encompass indirect recognition for the Escherichia coli model organism. We developed a novel methodology [Conditional Random fields of Smoothed Structural Data (CRoSSeD)], based on structural scales and conditional random fields to model and predict regulator binding sites. The value of relying on local structural-DNA properties is demonstrated by improved classifier performance on a large number of biological datasets, and by the detection of novel binding sites which could be validated by independent data sources, and which could not be identified using sequence data alone. We further show that the CRoSSeD-binding-site models can be related to the actual molecular mechanisms of the transcription factor DNA binding, and thus cannot only be used for prediction of novel sites, but might also give valuable insights into unknown binding mechanisms of transcription factors.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Overview of the CRoSSeD methodology. The sequence of known TF-binding sites (green) are collected and used to create different structural profiles by applying structural scales. These structural scales are then used as input for the CRoSSeD model which will create a binding site model featuring strongly conserved structural profile characteristics in specific regions at the binding sites. These binding site models can then be used to predict other binding sites (red) for the given TF in the genome.
Figure 2.
Figure 2.
(a) Flexibility profiles of all 40 positive synthetic samples (blue lines) as measured by the B-DNA twist scale, (lower values correspond to more flexible regions). The red line is the average profile. For comparison, the sequence conservation logo is also given for each position. At the bottom of the figure is the structural characteristic that was simulated (HR: high rigidity, LR: low rigidity). (b) ROC curve displaying the average result of five 10-fold cross validations for the CRoSSeD (blue line), BioBayesNet (green line), PWM (red line) and CRFseq (cyan line) model when applied to the synthetic data set.
Figure 3.
Figure 3.
Performance results of the different methods on the CRP (a) and PurR (b) data sets. The ROC curves display the trade-off between the sensitivity (the fraction of positive samples correctly identified as binding sites) and specificity (the fraction of incorrectly identified negative samples) of the results on the left out samples obtained at different probability thresholds for five 10-fold cross validations for the CRoSSeD (blue), CRFseq (cyan line), the PWM (red) and BioBayesNet model (green).
Figure 4.
Figure 4.
Overview of the co-expressed gene sets and their enrichment with high-scoring predicted binding sites obtained from respectively the structure- or sequence-based models. The pie chart represents all found co-expressed gene sets, divided into segments with no significant high-ranking predictions enrichment (blue) and gene sets that were found to be enriched and if this was for the binding sites predicted by the CRoSSeD (red), PWM (green) or both models (purple). A table is provided per segment, listing the related TF (in bold) and the most relevant significantly enriched gene function in the gene sets.
Figure 5.
Figure 5.
Representation of high-scoring binding site predictions enrichment in co-expressed gene sets for respectively ArgR (a), SoxS (b) and PurR (c). Each plot corresponds the entire ranked gene list as obtained from the screening using the PWM (red) and CRoSSeD (green) motif models with decreasing confidence from left to right. Marked are the positions of the genes that were found co-expressed with the known target genes of the respective TFs.
Figure 6.
Figure 6.
Important features contributing to the CRoSSeD model for, respectively, CRP (a) and PurR (b). In panel (a), the profile corresponds to the DNase-I cutting frequency (flexibility) profile based on the weights assigned to the CRP model. Plotted in the dark blue line is the weighted average of the property at each position in the motif and surrounding it in the light-blue area is the standard deviation on this average for each position. Panel (b) contains the disruption energy profile (stability) based on the PurR model.

Similar articles

Cited by

References

    1. Stormo GD. DNA binding sites: representation and discovery. Bioinformatics. 2000;16:16–23. - PubMed
    1. Gromiha MM, Siebers JG, Selvaraj S, Kono H, Sarai A. Role of inter and intramolecular interactions in protein-DNA recognition. Gene. 2005;364:108–113. - PubMed
    1. Kono H, Sarai A. Structure-based prediction of DNA target sites by regulatory proteins. Proteins. 1999;35:114–131. - PubMed
    1. Morozov AV, Havranek JJ, Baker D, Siggia ED. Protein-DNA binding specificity predictions with structural models. Nucleic Acids Res. 2005;33:5781–5798. - PMC - PubMed
    1. Angarica VE, Perez AG, Vasconcelos AT, Collado-Vides J, Contreras-Moreira B. Prediction of TF target sites based on atomistic models of protein-DNA complexes. BMC Bioinformatics. 2008;9:436. - PMC - PubMed

Publication types