Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Aug 8:14:241.
doi: 10.1186/1471-2105-14-241.

MatrixCatch--a novel tool for the recognition of composite regulatory elements in promoters

Affiliations

MatrixCatch--a novel tool for the recognition of composite regulatory elements in promoters

Igor V Deyneko et al. BMC Bioinformatics. .

Abstract

Background: Accurate recognition of regulatory elements in promoters is an essential prerequisite for understanding the mechanisms of gene regulation at the level of transcription. Composite regulatory elements represent a particular type of such transcriptional regulatory elements consisting of pairs of individual DNA motifs. In contrast to the present approach, most available recognition techniques are based purely on statistical evaluation of the occurrence of single motifs. Such methods are limited in application, since the accuracy of recognition is greatly dependent on the size and quality of the sequence dataset. Methods that exploit available knowledge and have broad applicability are evidently needed.

Results: We developed a novel method to identify composite regulatory elements in promoters using a library of known examples. In depth investigation of regularities encoded in known composite elements allowed us to introduce a new characteristic measure and to improve the specificity compared with other methods. Tests on an established benchmark and real genomic data show that our method outperforms other available methods based either on known examples or statistical evaluations. In addition to better recognition, a practical advantage of this method is first the ability to detect a high number of different types of composite elements, and second direct biological interpretation of the identified results. The program is available at http://gnaweb.helmholtz-hzi.de/cgi-bin/MCatch/MatrixCatch.pl and includes an option to extend the provided library by user supplied data.

Conclusions: The novel algorithm for the identification of composite regulatory elements presented in this paper was proved to be superior to existing methods. Its application to tissue specific promoters identified several highly specific composite elements with relevance to their biological function. This approach together with other methods will further advance the understanding of transcriptional regulation of genes.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Distributions of PWM scores and distances between BSs in real and random CEs. (A) Distribution of PWM scores for first and second BSs in real CEs (red) and random sequence CEs (blue). Scores Sm1 and Sm2 define the rectangle OABC and perfectly separate high scoring CEs. By reducing the scores (dashed green lines), many additional true CEs, but also a large number of random CE are also covered by the rectangle OA′B′C′. Introduction of a sum of scores (diagonal EF) greatly improves the separation between real and random CEs (discontinuous line A′E′F′C′). (B) Distribution of distances between BSs and sum of matrix scores in real CEs (blue). Distance values were averaged in intervals of score values (1.75-1.80), (1.80-1.85), (1.85-1.90), (1.90-1.95) and (1.95-2.00) (red). The trend line reflects the dependence between PWM scores and distance between BSs.
Figure 2
Figure 2
Receiver Operating Characteristic (ROC) curves of three methods on recognition of CE NFAT/AP-1.
Figure 3
Figure 3
Nucleotide level correlation scores (nCC) on the TRANSCompel dataset. Nucleotide level correlation scores (nCC) on the TRANSCompel dataset. The graphs show nCC scores at increasing noise levels. Values for CisModule could be calculated only for the “noise0” dataset. For further details see (Klepper et al. [1]).

References

    1. Klepper K, Sandve GK, Abul O, Johansen J, Drablos F. Assessment of composite motif discovery methods. BMC Bioinforma. 2008;9:123. doi: 10.1186/1471-2105-9-123. - DOI - PMC - PubMed
    1. Waleev T, Shtokalo D, Konovalova T, Voss N, Cheremushkin E, Stegmaier P, Kel-Margoulis O, Wingender E, Kel A. Composite Module Analyst: identification of transcription factor binding site combinations using genetic algorithm. Nucleic Acids Res. 2006;34:W541–W545. doi: 10.1093/nar/gkl342. - DOI - PMC - PubMed
    1. Wasserman WW, Fickett JW. Identification of regulatory regions which confer muscle-specific gene expression. J Mol Biol. 1998;278:167–181. doi: 10.1006/jmbi.1998.1700. - DOI - PubMed
    1. Kel A, Kel-Margoulis O, Babenko V, Wingender E. Recognition of NFATp/AP-1 composite elements within genes induced upon the activation of immune cells. J Mol Biol. 1999;288:353–376. doi: 10.1006/jmbi.1999.2684. - DOI - PubMed
    1. Krivan W, Wasserman WW. A predictive model for regulatory sequences directing liver-specific transcription. Genome Res. 2001;11:1559–1566. doi: 10.1101/gr.180601. - DOI - PMC - PubMed

Publication types

MeSH terms

LinkOut - more resources