Automated incorporation of pairwise dependency in transcription factor binding site prediction using dinucleotide weight tensors
- PMID: 28753602
- PMCID: PMC5550003
- DOI: 10.1371/journal.pcbi.1005176
Automated incorporation of pairwise dependency in transcription factor binding site prediction using dinucleotide weight tensors
Abstract
Gene regulatory networks are ultimately encoded by the sequence-specific binding of (TFs) to short DNA segments. Although it is customary to represent the binding specificity of a TF by a position-specific weight matrix (PSWM), which assumes each position within a site contributes independently to the overall binding affinity, evidence has been accumulating that there can be significant dependencies between positions. Unfortunately, methodological challenges have so far hindered the development of a practical and generally-accepted extension of the PSWM model. On the one hand, simple models that only consider dependencies between nearest-neighbor positions are easy to use in practice, but fail to account for the distal dependencies that are observed in the data. On the other hand, models that allow for arbitrary dependencies are prone to overfitting, requiring regularization schemes that are difficult to use in practice for non-experts. Here we present a new regulatory motif model, called dinucleotide weight tensor (DWT), that incorporates arbitrary pairwise dependencies between positions in binding sites, rigorously from first principles, and free from tunable parameters. We demonstrate the power of the method on a large set of ChIP-seq data-sets, showing that DWTs outperform both PSWMs and motif models that only incorporate nearest-neighbor dependencies. We also demonstrate that DWTs outperform two previously proposed methods. Finally, we show that DWTs inferred from ChIP-seq data also outperform PSWMs on HT-SELEX data for the same TF, suggesting that DWTs capture inherent biophysical properties of the interactions between the DNA binding domains of TFs and their binding sites. We make a suite of DWT tools available at dwt.unibas.ch, that allow users to automatically perform 'motif finding', i.e. the inference of DWT motifs from a set of sequences, binding site prediction with DWTs, and visualization of DWT 'dilogo' motifs.
Conflict of interest statement
The authors have declared that no competing interests exist.
Figures






Similar articles
-
Inferring intra-motif dependencies of DNA binding sites from ChIP-seq data.BMC Bioinformatics. 2015 Nov 9;16:375. doi: 10.1186/s12859-015-0797-4. BMC Bioinformatics. 2015. PMID: 26552868 Free PMC article.
-
Bayesian Markov models consistently outperform PWMs at predicting motifs in nucleotide sequences.Nucleic Acids Res. 2016 Jul 27;44(13):6055-69. doi: 10.1093/nar/gkw521. Epub 2016 Jun 9. Nucleic Acids Res. 2016. PMID: 27288444 Free PMC article.
-
PhyloGibbs: a Gibbs sampling motif finder that incorporates phylogeny.PLoS Comput Biol. 2005 Dec;1(7):e67. doi: 10.1371/journal.pcbi.0010067. Epub 2005 Dec 9. PLoS Comput Biol. 2005. PMID: 16477324 Free PMC article.
-
DNA Motif Databases and Their Uses.Curr Protoc Bioinformatics. 2015 Sep 3;51:2.15.1-2.15.6. doi: 10.1002/0471250953.bi0215s51. Curr Protoc Bioinformatics. 2015. PMID: 26334922 Review.
-
An algorithmic perspective of de novo cis-regulatory motif finding based on ChIP-seq data.Brief Bioinform. 2018 Sep 28;19(5):1069-1081. doi: 10.1093/bib/bbx026. Brief Bioinform. 2018. PMID: 28334268 Review.
Cited by
-
JASPAR 2018: update of the open-access database of transcription factor binding profiles and its web framework.Nucleic Acids Res. 2018 Jan 4;46(D1):D260-D266. doi: 10.1093/nar/gkx1126. Nucleic Acids Res. 2018. PMID: 29140473 Free PMC article.
-
The evaluation of transcription factor binding site prediction tools in human and Arabidopsis genomes.BMC Bioinformatics. 2024 Dec 2;25(1):371. doi: 10.1186/s12859-024-05995-0. BMC Bioinformatics. 2024. PMID: 39623329 Free PMC article.
-
Bacterial Metallostasis: Metal Sensing, Metalloproteome Remodeling, and Metal Trafficking.Chem Rev. 2024 Dec 25;124(24):13574-13659. doi: 10.1021/acs.chemrev.4c00264. Epub 2024 Dec 10. Chem Rev. 2024. PMID: 39658019 Free PMC article. Review.
-
Position-specific evolution in transcription factor binding sites, and a fast likelihood calculation for the F81 model.R Soc Open Sci. 2024 Jan 24;11(1):231088. doi: 10.1098/rsos.231088. eCollection 2024 Jan. R Soc Open Sci. 2024. PMID: 38269075 Free PMC article.
-
Disentangling transcription factor binding site complexity.Nucleic Acids Res. 2018 Nov 16;46(20):e121. doi: 10.1093/nar/gky683. Nucleic Acids Res. 2018. PMID: 30085218 Free PMC article.
References
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources
Miscellaneous