Increasing coverage of transcription factor position weight matrices through domain-level homology
- PMID: 22952610
- PMCID: PMC3428306
- DOI: 10.1371/journal.pone.0042779
Increasing coverage of transcription factor position weight matrices through domain-level homology
Abstract
Transcription factor-DNA interactions, central to cellular regulation and control, are commonly described by position weight matrices (PWMs). These matrices are frequently used to predict transcription factor binding sites in regulatory regions of DNA to complement and guide further experimental investigation. The DNA sequence preferences of transcription factors, encoded in PWMs, are dictated primarily by select residues within the DNA binding domain(s) that interact directly with DNA. Therefore, the DNA binding properties of homologous transcription factors with identical DNA binding domains may be characterized by PWMs derived from different species. Accordingly, we have implemented a fully automated domain-level homology searching method for identical DNA binding sequences.By applying the domain-level homology search to transcription factors with existing PWMs in the JASPAR and TRANSFAC databases, we were able to significantly increase coverage in terms of the total number of PWMs associated with a given species, assign PWMs to transcription factors that did not previously have any associations, and increase the number of represented species with PWMs over an order of magnitude. Additionally, using protein binding microarray (PBM) data, we have validated the domain-level method by demonstrating that transcription factor pairs with matching DNA binding domains exhibit comparable DNA binding specificity predictions to transcription factor pairs with completely identical sequences.The increased coverage achieved herein demonstrates the potential for more thorough species-associated investigation of protein-DNA interactions using existing resources. The PWM scanning results highlight the challenging nature of transcription factors that contain multiple DNA binding domains, as well as the impact of motif discovery on the ability to predict DNA binding properties. The method is additionally suitable for identifying domain-level homology mappings to enable utilization of additional information sources in the study of transcription factors. The domain-level homology search method, resulting PWM mappings, web-based user interface, and web API are publicly available at http://dodoma.systemsbiology.netdodoma.systemsbiology.net.
Conflict of interest statement
Figures










Similar articles
-
abc4pwm: affinity based clustering for position weight matrices in applications of DNA sequence analysis.BMC Bioinformatics. 2022 Mar 3;23(1):83. doi: 10.1186/s12859-022-04615-z. BMC Bioinformatics. 2022. PMID: 35240993 Free PMC article.
-
Optimized position weight matrices in prediction of novel putative binding sites for transcription factors in the Drosophila melanogaster genome.PLoS One. 2013 Aug 6;8(8):e68712. doi: 10.1371/journal.pone.0068712. Print 2013. PLoS One. 2013. PMID: 23936309 Free PMC article.
-
Reliable scaling of position weight matrices for binding strength comparisons between transcription factors.BMC Bioinformatics. 2015 Aug 20;16:265. doi: 10.1186/s12859-015-0666-1. BMC Bioinformatics. 2015. PMID: 26289072 Free PMC article.
-
DNA Motif Databases and Their Uses.Curr Protoc Bioinformatics. 2015 Sep 3;51:2.15.1-2.15.6. doi: 10.1002/0471250953.bi0215s51. Curr Protoc Bioinformatics. 2015. PMID: 26334922 Review.
-
Accuracy and reproducibility of protein-DNA microarray technology.Adv Biochem Eng Biotechnol. 2007;104:87-110. doi: 10.1007/10_2006_035. Adv Biochem Eng Biotechnol. 2007. PMID: 17290820 Review.
Cited by
-
GHT-SELEX demonstrates unexpectedly high intrinsic sequence specificity and complex DNA binding of many human transcription factors.bioRxiv [Preprint]. 2024 Nov 12:2024.11.11.618478. doi: 10.1101/2024.11.11.618478. bioRxiv. 2024. PMID: 39605368 Free PMC article. Preprint.
-
The bZIP Transcription Factor HAC-1 Is Involved in the Unfolded Protein Response and Is Necessary for Growth on Cellulose in Neurospora crassa.PLoS One. 2015 Jul 1;10(7):e0131415. doi: 10.1371/journal.pone.0131415. eCollection 2015. PLoS One. 2015. PMID: 26132395 Free PMC article.
-
Evidence for a hierarchical transcriptional circuit in Drosophila male germline involving testis-specific TAF and two gene-specific transcription factors, Mod and Acj6.FEBS Lett. 2018 Jan;592(1):46-59. doi: 10.1002/1873-3468.12937. Epub 2017 Dec 27. FEBS Lett. 2018. PMID: 29235675 Free PMC article.
-
Determination and inference of eukaryotic transcription factor sequence specificity.Cell. 2014 Sep 11;158(6):1431-1443. doi: 10.1016/j.cell.2014.08.009. Cell. 2014. PMID: 25215497 Free PMC article.
-
Redundant ERF-VII Transcription Factors Bind to an Evolutionarily Conserved cis-Motif to Regulate Hypoxia-Responsive Gene Expression in Arabidopsis.Plant Cell. 2016 Jan;28(1):160-80. doi: 10.1105/tpc.15.00866. Epub 2015 Dec 14. Plant Cell. 2016. PMID: 26668304 Free PMC article.
References
-
- Stormo GD (2000) DNA binding sites: representation and discovery. Bioinformatics 16: 16–23. - PubMed
-
- Wasserman WW, Sandelin A (2004) Applied bioinformatics for the identification of regulatory elements. Nat Rev Genet 5: 276–287. - PubMed
-
- Berg OG, von Hippel PH (1987) Selection of DNA binding sites by regulatory proteins. Statisticalmechanical theory and application to operators and promoters. J Mol Biol 193: 723–750. - PubMed
Publication types
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources