. 2014 Feb 21;9(2):e89226.

doi: 10.1371/journal.pone.0089226. eCollection 2014.

Transcription factor binding sites prediction based on modified nucleosomes

Mohammad Talebzadeh¹, Fatemeh Zare-Mirakabad¹

Affiliations

PMID: 24586611
PMCID: PMC3931712
DOI: 10.1371/journal.pone.0089226

Transcription factor binding sites prediction based on modified nucleosomes

Mohammad Talebzadeh et al. PLoS One. 2014.

. 2014 Feb 21;9(2):e89226.

doi: 10.1371/journal.pone.0089226. eCollection 2014.

Authors

Mohammad Talebzadeh¹, Fatemeh Zare-Mirakabad¹

Affiliation

¹ Department of Mathematics and Computer Science, AmirKabir University of Technology, Tehran, Iran.

PMID: 24586611
PMCID: PMC3931712
DOI: 10.1371/journal.pone.0089226

Abstract

In computational methods, position weight matrices (PWMs) are commonly applied for transcription factor binding site (TFBS) prediction. Although these matrices are more accurate than simple consensus sequences to predict actual binding sites, they usually produce a large number of false positive (FP) predictions and so are impoverished sources of information. Several studies have employed additional sources of information such as sequence conservation or the vicinity to transcription start sites to distinguish true binding regions from random ones. Recently, the spatial distribution of modified nucleosomes has been shown to be associated with different promoter architectures. These aligned patterns can facilitate DNA accessibility for transcription factors. We hypothesize that using data from these aligned and periodic patterns can improve the performance of binding region prediction. In this study, we propose two effective features, "modified nucleosomes neighboring" and "modified nucleosomes occupancy", to decrease FP in binding site discovery. Based on these features, we designed a logistic regression classifier which estimates the probability of a region as a TFBS. Our model learned each feature based on Sp1 binding sites on Chromosome 1 and was tested on the other chromosomes in human CD4+T cells. In this work, we investigated 21 histone modifications and found that only 8 out of 21 marks are strongly correlated with transcription factor binding regions. To prove that these features are not specific to Sp1, we combined the logistic regression classifier with the PWM, and created a new model to search TFBSs on the genome. We tested the model using transcription factors MAZ, PU.1 and ELF1 and compared the results to those using only the PWM. The results show that our model can predict Transcription factor binding regions more successfully. The relative simplicity of the model and capability of integrating other features make it a superior method for TFBS prediction.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

**Figure 1. ROC curves for predicting the binding regions of Sp1using the MNN feature.**
ROC curves for 21 LRCs trained on individual histone modifications for prediction of Sp1 binding regions, using the MNN feature. The LRCs corresponding to each histone modification were trained on Chromosome 1 and tested on Chromosome 2 to 22 and two sex chromosomes. The LRCs assign a score to each interval. Predictions of binding regions are based on these scores. These curves show that the MNN feature is predictive of binding regions even when no PWM score is used. The x-axis is the false positive rate and the y-axis is the true positive rate. Shown are the curves of the most predictive modifications. ROC curves for the rest 13 modifications can be found in Figure S1.

**Figure 2. AUC values corresponding to different histone modifications for predicting the binding regions of Sp1 based on the MNN feature.**
Results are shown for predicting the binding sites of Sp1 in CD4+T cells using the MNN feature. The height of each bar corresponds to the Area under the ROC curves. Certain modifications are more predictive for true binding regions. Comparing the results with using the PWM alone (Figure3) clearly shows that the MNN feature, especially for certain modifications, can be used as an informative feature for TFBSs prediction.

**Figure 3. The standard ROC curves for the traditional motif scanning method with a zero order background model.**
Result is shown for predicting the test set (Chromosome 2 to Chromosome 22 and two sex chromosomes) binding regions of Sp1 in CD4+T cells using the PWM. AUC value corresponding to this curve is 0.7880.

**Figure 4. Improvement in Sp1 binding site prediction by combining MNN data from different modifications.**
ROC curves for a number of different methods for predicting bound locations. Results of predictions made by combining all 21 modifications (green line); 8 modifications (black line) and integrating H2A.Z and H3K4me3 data (blue line). Comparing this figure with Figure 1 shows that applying the LRCs to the data of single modifications perform better than those LRCs trained with the combination of histone modifications. This may be due to the fact that the predictive ability for distinguishing true target regions is redundantly encoded among histone marks.

**Figure 5. Distributions of nucleosome positions around Sp1 binding sites.**
Distributions of the central positions of nucleosomes for the top 8 marks and 3 repressive marks around Sp1 binding sites on the genome. The x-axis shows genomic positions with respect to central position of Sp1 binding sites (from −1015bp to +1015bp). The positions of nucleosomes are defined as the positions from −15 bp to 15 bp with respect to the center of the nucleosome. Active marks are highly enriched around binding sites and show a bimodal distribution around these sites. A nucleosome free region with respect to central position of binding sites is also observable in all top marks.

**Figure 6. ROC curves for predicting the binding locations of MAZ, ELF1 and PU.1 using the MNN feature combined with the PWM scores.**
Results are shown for predicting the binding locations of A) MAZ, B) PU.1, C) ELF1 using the MNN feature with different histone modifications, combined with the PWM scores. The final score assigned to each region is as introduced in the Methods. ROC curves for the rest of the 13 modifications can be found in Figure S3. Comparing this figure with Figure S4, S5, S6 clearly demonstrate the usefulness of the MNN feature for prediction of binding locations.

formula image — **Figure 6. ROC curves for predicting the binding locations of MAZ, ELF1 and PU.1 using the MNN feature combined with the PWM scores.**
Results are shown for predicting the binding locations of A) MAZ, B) PU.1, C) ELF1 using the MNN feature with different histone modifications, combined with the PWM scores. The final score assigned to each region is as introduced in the Methods. ROC curves for the rest of the 13 modifications can be found in Figure S3. Comparing this figure with Figure S4, S5, S6 clearly demonstrate the usefulness of the MNN feature for prediction of binding locations.

See this image and copyright information in PMC

Cited by

Cross-Cell-Type Prediction of TF-Binding Site by Integrating Convolutional Neural Network and Adversarial Network.
Lan G, Zhou J, Xu R, Lu Q, Wang H. Lan G, et al. Int J Mol Sci. 2019 Jul 12;20(14):3425. doi: 10.3390/ijms20143425. Int J Mol Sci. 2019. PMID: 31336830 Free PMC article.
Discovering human transcription factor physical interactions with genetic variants, novel DNA motifs, and repetitive elements using enhanced yeast one-hybrid assays.
Shrestha S, Sewell JA, Santoso CS, Forchielli E, Carrasco Pro S, Martinez M, Fuxman Bass JI. Shrestha S, et al. Genome Res. 2019 Sep;29(9):1533-1544. doi: 10.1101/gr.248823.119. Genome Res. 2019. PMID: 31481462 Free PMC article.
A post-GWAS confirming the genetic effects and functional polymorphisms of AGPAT3 gene on milk fatty acids in dairy cattle.
Shi L, Wu X, Yang Y, Ma Z, Lv X, Liu L, Li Y, Zhao F, Han B, Sun D. Shi L, et al. J Anim Sci Biotechnol. 2021 Feb 1;12(1):24. doi: 10.1186/s40104-020-00540-4. J Anim Sci Biotechnol. 2021. PMID: 33522959 Free PMC article.
Contribution of Sequence Motif, Chromatin State, and DNA Structure Features to Predictive Models of Transcription Factor Binding in Yeast.
Tsai ZT, Shiu SH, Tsai HK. Tsai ZT, et al. PLoS Comput Biol. 2015 Aug 20;11(8):e1004418. doi: 10.1371/journal.pcbi.1004418. eCollection 2015 Aug. PLoS Comput Biol. 2015. PMID: 26291518 Free PMC article.
A comprehensive review of computational prediction of genome-wide features.
Xu T, Zheng X, Li B, Jin P, Qin Z, Wu H. Xu T, et al. Brief Bioinform. 2020 Jan 17;21(1):120-134. doi: 10.1093/bib/bby110. Brief Bioinform. 2020. PMID: 30462144 Free PMC article.

See all "Cited by" articles

References

1. Ernst J, Plasterer HL, Simon I, Bar-Joseph Z (2010) Integrating multiple evidence sources to predict transcription factor binding in the human genome. Genome research 20: 526–536. - PMC - PubMed
1. Won KJ, Ren B, Wang W (2010) Genome-wide prediction of transcription factor binding sites using an integrated model. Genome biology 7 11. - PMC - PubMed
1. Cuellar-Partida G, Buske FA, McLeay RC, Whitington T, Noble WS, et al. (2012) Epigenetic priors for identifying active transcription factor binding sites. Bioinformatics 28: 56–62. - PMC - PubMed
1. Holloway DT, Kon M, DeLisi C (2005) Integrating genomic data to predict transcription factor binding. Genome Informatics Series 16: 83. - PubMed
1. Lähdesmäki H, Rust AG, Shmulevich I (2008) Probabilistic inference of transcription factor binding from multiple data sources. PLoS One 3: 1820. - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations
Research Materials
- NCI CPTC Antibody Characterization Program

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Transcription factor binding sites prediction based on modified nucleosomes

Affiliation

Transcription factor binding sites prediction based on modified nucleosomes

Authors

Affiliation

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Other Literature Sources

Research Materials