Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Feb 21;9(2):e89226.
doi: 10.1371/journal.pone.0089226. eCollection 2014.

Transcription factor binding sites prediction based on modified nucleosomes

Affiliations

Transcription factor binding sites prediction based on modified nucleosomes

Mohammad Talebzadeh et al. PLoS One. .

Abstract

In computational methods, position weight matrices (PWMs) are commonly applied for transcription factor binding site (TFBS) prediction. Although these matrices are more accurate than simple consensus sequences to predict actual binding sites, they usually produce a large number of false positive (FP) predictions and so are impoverished sources of information. Several studies have employed additional sources of information such as sequence conservation or the vicinity to transcription start sites to distinguish true binding regions from random ones. Recently, the spatial distribution of modified nucleosomes has been shown to be associated with different promoter architectures. These aligned patterns can facilitate DNA accessibility for transcription factors. We hypothesize that using data from these aligned and periodic patterns can improve the performance of binding region prediction. In this study, we propose two effective features, "modified nucleosomes neighboring" and "modified nucleosomes occupancy", to decrease FP in binding site discovery. Based on these features, we designed a logistic regression classifier which estimates the probability of a region as a TFBS. Our model learned each feature based on Sp1 binding sites on Chromosome 1 and was tested on the other chromosomes in human CD4+T cells. In this work, we investigated 21 histone modifications and found that only 8 out of 21 marks are strongly correlated with transcription factor binding regions. To prove that these features are not specific to Sp1, we combined the logistic regression classifier with the PWM, and created a new model to search TFBSs on the genome. We tested the model using transcription factors MAZ, PU.1 and ELF1 and compared the results to those using only the PWM. The results show that our model can predict Transcription factor binding regions more successfully. The relative simplicity of the model and capability of integrating other features make it a superior method for TFBS prediction.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. ROC curves for predicting the binding regions of Sp1using the MNN feature.
ROC curves for 21 LRCs trained on individual histone modifications for prediction of Sp1 binding regions, using the MNN feature. The LRCs corresponding to each histone modification were trained on Chromosome 1 and tested on Chromosome 2 to 22 and two sex chromosomes. The LRCs assign a score to each interval. Predictions of binding regions are based on these scores. These curves show that the MNN feature is predictive of binding regions even when no PWM score is used. The x-axis is the false positive rate and the y-axis is the true positive rate. Shown are the curves of the most predictive modifications. ROC curves for the rest 13 modifications can be found in Figure S1.
Figure 2
Figure 2. AUC values corresponding to different histone modifications for predicting the binding regions of Sp1 based on the MNN feature.
Results are shown for predicting the binding sites of Sp1 in CD4+T cells using the MNN feature. The height of each bar corresponds to the Area under the ROC curves. Certain modifications are more predictive for true binding regions. Comparing the results with using the PWM alone (Figure3) clearly shows that the MNN feature, especially for certain modifications, can be used as an informative feature for TFBSs prediction.
Figure 3
Figure 3. The standard ROC curves for the traditional motif scanning method with a zero order background model.
Result is shown for predicting the test set (Chromosome 2 to Chromosome 22 and two sex chromosomes) binding regions of Sp1 in CD4+T cells using the PWM. AUC value corresponding to this curve is 0.7880.
Figure 4
Figure 4. Improvement in Sp1 binding site prediction by combining MNN data from different modifications.
ROC curves for a number of different methods for predicting bound locations. Results of predictions made by combining all 21 modifications (green line); 8 modifications (black line) and integrating H2A.Z and H3K4me3 data (blue line). Comparing this figure with Figure 1 shows that applying the LRCs to the data of single modifications perform better than those LRCs trained with the combination of histone modifications. This may be due to the fact that the predictive ability for distinguishing true target regions is redundantly encoded among histone marks.
Figure 5
Figure 5. Distributions of nucleosome positions around Sp1 binding sites.
Distributions of the central positions of nucleosomes for the top 8 marks and 3 repressive marks around Sp1 binding sites on the genome. The x-axis shows genomic positions with respect to central position of Sp1 binding sites (from −1015bp to +1015bp). The positions of nucleosomes are defined as the positions from −15 bp to 15 bp with respect to the center of the nucleosome. Active marks are highly enriched around binding sites and show a bimodal distribution around these sites. A nucleosome free region with respect to central position of binding sites is also observable in all top marks.
Figure 6
Figure 6. ROC curves for predicting the binding locations of MAZ, ELF1 and PU.1 using the MNN feature combined with the PWM scores.
Results are shown for predicting the binding locations of A) MAZ, B) PU.1, C) ELF1 using the MNN feature with different histone modifications, combined with the PWM scores. The final score assigned to each region is formula image as introduced in the Methods. ROC curves for the rest of the 13 modifications can be found in Figure S3. Comparing this figure with Figure S4, S5, S6 clearly demonstrate the usefulness of the MNN feature for prediction of binding locations.

Similar articles

Cited by

References

    1. Ernst J, Plasterer HL, Simon I, Bar-Joseph Z (2010) Integrating multiple evidence sources to predict transcription factor binding in the human genome. Genome research 20: 526–536. - PMC - PubMed
    1. Won KJ, Ren B, Wang W (2010) Genome-wide prediction of transcription factor binding sites using an integrated model. Genome biology 7 11. - PMC - PubMed
    1. Cuellar-Partida G, Buske FA, McLeay RC, Whitington T, Noble WS, et al. (2012) Epigenetic priors for identifying active transcription factor binding sites. Bioinformatics 28: 56–62. - PMC - PubMed
    1. Holloway DT, Kon M, DeLisi C (2005) Integrating genomic data to predict transcription factor binding. Genome Informatics Series 16: 83. - PubMed
    1. Lähdesmäki H, Rust AG, Shmulevich I (2008) Probabilistic inference of transcription factor binding from multiple data sources. PLoS One 3: 1820. - PMC - PubMed