Comparative Study

. 2016 May 21:17:388.

doi: 10.1186/s12864-016-2729-8.

A systematic, large-scale comparison of transcription factor binding site models

Daniela Hombach^{1

2}, Jana Marie Schwarz^{1

2}, Peter N Robinson³, Markus Schuelke^{1

2}, Dominik Seelow^{4

5

6}

Affiliations

¹ Department of Neuropaediatrics, Charité-Universitätsmedizin Berlin, Berlin, Germany.
² NeuroCure Clinical Research Center, Charité - Universitätsmedizin Berlin, Berlin, Germany.
³ Institute for Medical Genetics and Human Genetics, Charité-Universitätsmedizin Berlin, Berlin, Germany.
⁴ Department of Neuropaediatrics, Charité-Universitätsmedizin Berlin, Berlin, Germany. dominik.seelow@charite.de.
⁵ NeuroCure Clinical Research Center, Charité - Universitätsmedizin Berlin, Berlin, Germany. dominik.seelow@charite.de.
⁶ Berliner Institut für Gesundheitsforschung / Berlin Institute of Health, Berlin, Germany. dominik.seelow@charite.de.

PMID: 27209209
PMCID: PMC4875604
DOI: 10.1186/s12864-016-2729-8

Comparative Study

A systematic, large-scale comparison of transcription factor binding site models

Daniela Hombach et al. BMC Genomics. 2016.

. 2016 May 21:17:388.

doi: 10.1186/s12864-016-2729-8.

Authors

Daniela Hombach^{1

2}, Jana Marie Schwarz^{1

2}, Peter N Robinson³, Markus Schuelke^{1

2}, Dominik Seelow^{4

5

6}

Affiliations

¹ Department of Neuropaediatrics, Charité-Universitätsmedizin Berlin, Berlin, Germany.
² NeuroCure Clinical Research Center, Charité - Universitätsmedizin Berlin, Berlin, Germany.
³ Institute for Medical Genetics and Human Genetics, Charité-Universitätsmedizin Berlin, Berlin, Germany.
⁴ Department of Neuropaediatrics, Charité-Universitätsmedizin Berlin, Berlin, Germany. dominik.seelow@charite.de.
⁵ NeuroCure Clinical Research Center, Charité - Universitätsmedizin Berlin, Berlin, Germany. dominik.seelow@charite.de.
⁶ Berliner Institut für Gesundheitsforschung / Berlin Institute of Health, Berlin, Germany. dominik.seelow@charite.de.

PMID: 27209209
PMCID: PMC4875604
DOI: 10.1186/s12864-016-2729-8

Erratum in

Erratum to: A systematic, large-scale comparison of transcription factor binding site models.
Hombach D, Schwarz JM, Robinson PN, Schuelke M, Seelow D. Hombach D, et al. BMC Genomics. 2016 Jul 20;17(1):502. doi: 10.1186/s12864-016-2818-8. BMC Genomics. 2016. PMID: 27440159 Free PMC article. No abstract available.

Abstract

Background: The modelling of gene regulation is a major challenge in biomedical research. This process is dominated by transcription factors (TFs) and mutations in their binding sites (TFBSs) may cause the misregulation of genes, eventually leading to disease. The consequences of DNA variants on TF binding are modelled in silico using binding matrices, but it remains unclear whether these are capable of accurately representing in vivo binding. In this study, we present a systematic comparison of binding models for 82 human TFs from three freely available sources: JASPAR matrices, HT-SELEX-generated models and matrices derived from protein binding microarrays (PBMs). We determined their ability to detect experimentally verified "real" in vivo TFBSs derived from ENCODE ChIP-seq data. As negative controls we chose random downstream exonic sequences, which are unlikely to harbour TFBS. All models were assessed by receiver operating characteristics (ROC) analysis.

Results: While the area-under-curve was low for most of the tested models with only 47 % reaching a score of 0.7 or higher, we noticed strong differences between the various position-specific scoring matrices with JASPAR and HT-SELEX models showing higher success rates than PBM-derived models. In addition, we found that while TFBS sequences showed a higher degree of conservation than randomly chosen sequences, there was a high variability between individual TFBSs.

Conclusions: Our results show that only few of the matrix-based models used to predict potential TFBS are able to reliably detect experimentally confirmed TFBS. We compiled our findings in a freely accessible web application called ePOSSUM ( http:/mutationtaster.charite.de/ePOSSUM/ ) which uses a Bayes classifier to assess the impact of genetic alterations on TF binding in user-defined sequences. Additionally, ePOSSUM provides information on the reliability of the prediction using our test set of experimentally confirmed binding sites.

Keywords: Genetic variation; PSSM; TFBS prediction; Transcription factor binding sites.

PubMed Disclaimer

Figures

**Fig. 1**
Average AUC scores and representative ROC plots for different binding model sources. a Average AUC scores generated for the different binding model sources. b ROC plot for TFAP2C. c ROC plots for TFAP2A for the entire ENCODE test set (*left*) and the high confidence set (*right*). d Underlying TF binding models for TFAP2A

**Fig. 2**
Direct comparison of binding models generated by different methods. Depicted are AUC scores for TFs stored in both JASPAR (manually collected curated models) and HT-SELEX. AUC scores were generated using ROCR. If multiple binding models were available for one TF, we depict the average AUC value

**Fig. 3**
Representative plots for conservation analyses. We determined the maximum phastCons (a) and phyloP (b) scores in each experimentally confirmed binding site of BCL11A (*left panel*) and ZBTB33 (*right panel*) and calculated the averages of the maximum scores

**Fig. 4**
Overview of tested TFs for the entire data set (a) and the high-confidence data (b). ENCODE: Entire set of ENCODE TFBSs (2012 freeze). High confidence set: TFs reaching at least 80 % of the maximum possible binding score (published by ENCODE) in at least 100 binding instances. Please note that the intersections are not drawn to scale

See this image and copyright information in PMC

Cited by

Insight on Transcriptional Regulation of the Energy Sensing AMPK and Biosynthetic mTOR Pathway Genes.
Sukumaran A, Choi K, Dasgupta B. Sukumaran A, et al. Front Cell Dev Biol. 2020 Jul 29;8:671. doi: 10.3389/fcell.2020.00671. eCollection 2020. Front Cell Dev Biol. 2020. PMID: 32903688 Free PMC article. Review.
Technologies for profiling the impact of genomic variants on transcription factor binding.
Leiz J, Rutkiewicz M, Birchmeier C, Heinemann U, Schmidt-Ott KM. Leiz J, et al. Med Genet. 2021 Aug 14;33(2):147-155. doi: 10.1515/medgen-2021-2073. eCollection 2021 Jun. Med Genet. 2021. PMID: 38836027 Free PMC article.
Common TDP1 Polymorphisms in Relation to Survival among Small Cell Lung Cancer Patients: A Multicenter Study from the International Lung Cancer Consortium.
Lohavanichbutr P, Sakoda LC, Amos CI, Arnold SM, Christiani DC, Davies MPA, Field JK, Haura EB, Hung RJ, Kohno T, Landi MT, Liu G, Liu Y, Marcus MW, O'Kane GM, Schabath MB, Shiraishi K, Slone SA, Tardón A, Yang P, Yoshida K, Zhang R, Zong X, Goodman GE, Weiss NS, Chen C. Lohavanichbutr P, et al. Clin Cancer Res. 2017 Dec 15;23(24):7550-7557. doi: 10.1158/1078-0432.CCR-17-1401. Epub 2017 Oct 3. Clin Cancer Res. 2017. PMID: 28974547 Free PMC article.
Neurobiological functions of transcriptional enhancers.
Nord AS, West AE. Nord AS, et al. Nat Neurosci. 2020 Jan;23(1):5-14. doi: 10.1038/s41593-019-0538-5. Epub 2019 Nov 18. Nat Neurosci. 2020. PMID: 31740812 Free PMC article. Review.
PTBP2 - a gene with relevance for both Anorexia nervosa and body weight regulation.
Zheng Y, Rajcsanyi LS, Herpertz-Dahlmann B, Seitz J, de Zwaan M, Herzog W, Ehrlich S, Zipfel S, Giel K, Egberts K, Burghardt R, Föcker M, Al-Lahham S, Peters T, Libuda L, Antel J, Hebebrand J, Hinney A. Zheng Y, et al. Transl Psychiatry. 2022 Jun 9;12(1):241. doi: 10.1038/s41398-022-02018-5. Transl Psychiatry. 2022. PMID: 35680849 Free PMC article.

See all "Cited by" articles

References

1. Dynan WS, Tjian R. The promoter-specific transcription factor Sp1 binds to upstream sequences in the SV40 early promoter. Cell. 1983;35:79–87. doi: 10.1016/0092-8674(83)90210-6. - DOI - PubMed
1. Lee TI, Young RA. Transcriptional regulation and its misregulation in disease. Cell. 2013;152:1237–51. doi: 10.1016/j.cell.2013.02.014. - DOI - PMC - PubMed
1. Schwarz JM, Cooper DN, Schuelke M, Seelow D. MutationTaster2: mutation prediction for the deep-sequencing age. Nat Methods. 2014;11:361–2. doi: 10.1038/nmeth.2890. - DOI - PubMed
1. Sim N-L, Kumar P, Hu J, Henikoff S, Schneider G, Ng PC. SIFT web server: predicting effects of amino acid substitutions on proteins. Nucleic Acids Res. 2012;40:W452–7. doi: 10.1093/nar/gks539. - DOI - PMC - PubMed
1. Adzhubei IA, Schmidt S, Peshkin L, Ramensky VE, Gerasimova A, Bork P, et al. A method and server for predicting damaging missense mutations. Nat Methods. 2010;7:248–9. doi: 10.1038/nmeth0410-248. - DOI - PMC - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations
Research Materials
- NCI CPTC Antibody Characterization Program
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

A systematic, large-scale comparison of transcription factor binding site models

Affiliations

A systematic, large-scale comparison of transcription factor binding site models

Authors

Affiliations

Erratum in

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Other Literature Sources

Research Materials

Miscellaneous

Erratum in

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Related information

LinkOut - more resources

Full Text Sources

Other Literature Sources

Research Materials

Miscellaneous