Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2006 Apr 6:1:11.
doi: 10.1186/1745-6150-1-11.

A survey of motif discovery methods in an integrated framework

Affiliations

A survey of motif discovery methods in an integrated framework

Geir Kjetil Sandve et al. Biol Direct. .

Abstract

Background: There has been a growing interest in computational discovery of regulatory elements, and a multitude of motif discovery methods have been proposed. Computational motif discovery has been used with some success in simple organisms like yeast. However, as we move to higher organisms with more complex genomes, more sensitive methods are needed. Several recent methods try to integrate additional sources of information, including microarray experiments (gene expression and ChlP-chip). There is also a growing awareness that regulatory elements work in combination, and that this combinatorial behavior must be modeled for successful motif discovery. However, the multitude of methods and approaches makes it difficult to get a good understanding of the current status of the field.

Results: This paper presents a survey of methods for motif discovery in DNA, based on a structured and well defined framework that integrates all relevant elements. Existing methods are discussed according to this framework.

Conclusion: The survey shows that although no single method takes all relevant elements into consideration, a very large number of different models treating the various elements separately have been tried. Very often the choices that have been made are not explicitly stated, making it difficult to compare different implementations. Also, the tests that have been used are often not comparable. Therefore, a stringent framework and improved test methods are needed to evaluate the different approaches in order to conclude which ones are most promising.

Reviewers: This article was reviewed by Eugene V. Koonin, Philipp Bucher (nominated by Mikhail Gelfand) and Frank Eisenhaber.

PubMed Disclaimer

Figures

Figure 1
Figure 1
A schematic view of the integrated framework. A single motif, denoted by mg, consists of two parts, mg is how well the sequence matches a consensus, while og is a prior on whether any regulatory element is to occur at that position. A set of single motifs, together with inter-motif distance restrictions (d), then forms a composite motif (cg). Finally, multiple occurrences of a composite motif in the regulatory regions of a gene is represented by a gene score Gc.

References

    1. Korn LJ, Queen CL, Wegman MN. Computer analysis of nucleic acid regulatory sequences. Proc Natl Acad Sci U S A. 1977;74:4401–5. - PMC - PubMed
    1. Queen C, Wegman MN, Korn LJ. Improvements to a program for DNA analysis: a procedure to find homologies among many sequences. Nucleic Acids Res. 1982;10:449–56. - PMC - PubMed
    1. Stormo GD, Schneider TD, Gold L, Ehrenfeucht A. of the 'Perceptron' algorithm to distinguish translational initiation sites in E. coli. Nucleic Acids Res. 1982;10:2997–3011. - PMC - PubMed
    1. Staden R. Computer methods to locate signals in nucleic acid sequences. Nucleic Acids Res. 1984;12:505–19. - PMC - PubMed
    1. Stormo GD. DNA binding sites: representation and discovery. Bioinformatics. 2000;16:16–23. - PubMed

LinkOut - more resources