ANN-Spec: a method for discovering transcription factor binding sites with improved specificity
- PMID: 10902194
- DOI: 10.1142/9789814447331_0044
ANN-Spec: a method for discovering transcription factor binding sites with improved specificity
Abstract
This work describes ANN-Spec, a machine learning algorithm and its application to discovering un-gapped patterns in DNA sequence. The approach makes use of an Artificial Neural Network and a Gibbs sampling method to define the Specificity of a DNA-binding protein. ANN-Spec searches for the parameters of a simple network (or weight matrix) that will maximize the specificity for binding sequences of a positive set compared to a background sequence set. Binding sites in the positive data set are found with the resulting weight matrix and these sites are then used to define a local multiple sequence alignment. Training complexity is O(lN) where l is the width of the pattern and N is the size of the positive training data. A quantitative comparison of ANN-Spec and a few related programs is presented. The comparison shows that ANN-Spec finds patterns of higher specificity when training with a background data set. The program and documentation are available from the authors for UNIX systems.
Similar articles
-
Identifying target sites for cooperatively binding factors.Bioinformatics. 2001 Jul;17(7):608-21. doi: 10.1093/bioinformatics/17.7.608. Bioinformatics. 2001. PMID: 11448879
-
Modeling transcription factor binding sites with Gibbs Sampling and Minimum Description Length encoding.Proc Int Conf Intell Syst Mol Biol. 1997;5:268-71. Proc Int Conf Intell Syst Mol Biol. 1997. PMID: 9322048
-
PhyloGibbs: a Gibbs sampling motif finder that incorporates phylogeny.PLoS Comput Biol. 2005 Dec;1(7):e67. doi: 10.1371/journal.pcbi.0010067. Epub 2005 Dec 9. PLoS Comput Biol. 2005. PMID: 16477324 Free PMC article.
-
Evaluation of computer tools for the prediction of transcription factor binding sites on genomic DNA.In Silico Biol. 1998;1(1):21-8. In Silico Biol. 1998. PMID: 11471239 No abstract available.
-
GANN: genetic algorithm neural networks for the detection of conserved combinations of features in DNA.BMC Bioinformatics. 2005 Feb 22;6:36. doi: 10.1186/1471-2105-6-36. BMC Bioinformatics. 2005. PMID: 15725347 Free PMC article.
Cited by
-
Towards a theoretical understanding of false positives in DNA motif finding.BMC Bioinformatics. 2012 Jun 27;13:151. doi: 10.1186/1471-2105-13-151. BMC Bioinformatics. 2012. PMID: 22738169 Free PMC article.
-
Genomewide bioinformatic analysis negates any specific role for Dof, GATA and Ag/cTCA motifs in nitrate responsive gene expression in Arabidopsis.Physiol Mol Biol Plants. 2009 Apr;15(2):145-50. doi: 10.1007/s12298-009-0016-8. Epub 2009 Jun 28. Physiol Mol Biol Plants. 2009. PMID: 23572923 Free PMC article.
-
Analysis of computational approaches for motif discovery.Algorithms Mol Biol. 2006 May 19;1:8. doi: 10.1186/1748-7188-1-8. Algorithms Mol Biol. 2006. PMID: 16722558 Free PMC article.
-
MotifCombinator: a web-based tool to search for combinations of cis-regulatory motifs.BMC Bioinformatics. 2007 Mar 22;8:100. doi: 10.1186/1471-2105-8-100. BMC Bioinformatics. 2007. PMID: 17378935 Free PMC article.
-
Computational approaches to identify promoters and cis-regulatory elements in plant genomes.Plant Physiol. 2003 Jul;132(3):1162-76. doi: 10.1104/pp.102.017715. Plant Physiol. 2003. PMID: 12857799 Free PMC article. Review.
Publication types
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources