Computational technique for improvement of the position-weight matrices for the DNA/protein binding sites
- PMID: 15849315
- PMCID: PMC1084321
- DOI: 10.1093/nar/gki519
Computational technique for improvement of the position-weight matrices for the DNA/protein binding sites
Abstract
Position-weight matrices (PWMs) are broadly used to locate transcription factor binding sites in DNA sequences. The majority of existing PWMs provide a low level of both sensitivity and specificity. We present a new computational algorithm, a modification of the Staden-Bucher approach, that improves the PWM. We applied the proposed technique on the PWM of the GC-box, binding site for Sp1. The comparison of old and new PWMs shows that the latter increase both sensitivity and specificity. The statistical parameters of GC-box distribution in promoter regions and in the human genome, as well as in each chromosome, are presented. The majority of commonly used PWMs are the 4-row mononucleotide matrices, although 16-row dinucleotide matrices are known to be more informative. The algorithm efficiently determines the 16-row matrices and preliminary results show that such matrices provide better results than 4-row matrices.
Figures
References
-
- Berg O., von Hippel P. Selection of DNA binding sites by regulatory proteins. Statistical-mechanical theory and application to operators and promoters. J. Mol. Biol. 1987;193:723–750. - PubMed
-
- Day W.H., McMorris F.R. Threshold consensus methods for molecular sequences. J. Theor. Biol. 1992;159:481–489. - PubMed
-
- Stormo G. DNA binding sites: representation and discovery. Bioinformatics. 2000;16:16–23. - PubMed
Publication types
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources
Miscellaneous
