Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Jan;38(2):e12.
doi: 10.1093/nar/gkp907. Epub 2009 Nov 11.

Accurate recognition of cis-regulatory motifs with the correct lengths in prokaryotic genomes

Affiliations

Accurate recognition of cis-regulatory motifs with the correct lengths in prokaryotic genomes

Guojun Li et al. Nucleic Acids Res. 2010 Jan.

Abstract

We present a new computational method for solving a classical problem, the identification problem of cis-regulatory motifs in a given set of promoter sequences, based on one key new idea. Instead of scoring candidate motifs individually like in all the existing motif-finding programs, our method scores groups of candidate motifs with similar sequences, called motif closures, using a P-value, which has substantially improved the prediction reliability over the existing methods. Our new P-value scoring scheme is sequence length independent, hence allowing direct comparisons among predicted motifs with different lengths on the same footing. We have implemented this method as a Motif Recognition Computer (MREC) program, and have extensively tested MREC on both simulated and biological data from prokaryotic genomes. Our test results indicate that MREC can accurately pick out the actual motif with the correct length as the best scoring candidate for the vast majority of the cases in our test set. We compared our prediction results with two motif-finding programs Cosmo and MEME, and found that MREC outperforms both programs across all the test cases by a large margin. The MREC program is available at http://csbl.bmb.uga.edu/~bingqiang/MREC1/.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Performance of MREC, MEME and Cosmo on simulated data generated using nine point–mutation rates. (A) The dataset with different mutation rates. (B) The dataset with different motif lengths.
Figure 2.
Figure 2.
Comparison between the P-value by MREC, csFFT and CONSENSUS. Here we take examples of the ArgR and DnaA datasets in E. coli. The pink dash lines correspond to the correct motif length.

Similar articles

Cited by

References

    1. Stormo GD. DNA binding sites: representation and discovery. Bioinformatics. 2000;16:16–23. - PubMed
    1. GuhaThakurta D. Computational identification of transcriptional regulatory elements in DNA sequence. Nucleic Acids Res. 2006;34:3585–3598. - PMC - PubMed
    1. Tompa M, Li N, Bailey TL, Church GM, De Moor B, Eskin E, Favorov AV, Frith MC, Fu Y, Kent WJ, et al. Assessing computational tools for the discovery of transcription factor binding sites. Nat. Biotechnol. 2005;23:137–144. - PubMed
    1. Neuwald AF, Liu JS, Lawrence CE. Gibbs motif sampling: detection of bacterial outer membrane protein repeats. Protein Sci. 1995;4:1618–1632. - PMC - PubMed
    1. Thijs G, Marchal K, Lescot M, Rombauts S, De Moor B, Rouze P, Moreau Y. A Gibbs sampling method to detect overrepresented motifs in the upstream regions of coexpressed genes. J. Comput. Biol. 2002;9:447–464. - PubMed

Publication types