A DNA shape-based regulatory score improves position-weight matrix-based recognition of transcription factor binding sites
- PMID: 26130577
- PMCID: PMC4838056
- DOI: 10.1093/bioinformatics/btv391
A DNA shape-based regulatory score improves position-weight matrix-based recognition of transcription factor binding sites
Abstract
Motivation: The position-weight matrix (PWM) is a useful representation of a transcription factor binding site (TFBS) sequence pattern because the PWM can be estimated from a small number of representative TFBS sequences. However, because the PWM probability model assumes independence between individual nucleotide positions, the PWMs for some TFs poorly discriminate binding sites from non-binding-sites that have similar sequence content. Since the local three-dimensional DNA structure ('shape') is a determinant of TF binding specificity and since DNA shape has a significant sequence-dependence, we combined DNA shape-derived features into a TF-generalized regulatory score and tested whether the score could improve PWM-based discrimination of TFBS from non-binding-sites.
Results: We compared a traditional PWM model to a model that combines the PWM with a DNA shape feature-based regulatory potential score, for accuracy in detecting binding sites for 75 vertebrate transcription factors. The PWM+shape model was more accurate than the PWM-only model, for 45% of TFs tested, with no significant loss of accuracy for the remaining TFs.
Availability and implementation: The shape-based model is available as an open-source R package at that is archived on the GitHub software repository at https://github.com/ramseylab/regshape/.
Contact: stephen.ramsey@oregonstate.edu
Supplementary information: Supplementary data are available at Bioinformatics online.
© The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Figures
References
-
- Breiman L. (2001) Random forests. Machine Learn., 45, 5–32.
-
- Cartharius K., et al. (2005) MatInspector and beyond: promoter analysis based on transcription factor binding sites. Bioinformatics, 21, 2933–2942. - PubMed
Publication types
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
Miscellaneous
