Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012;7(10):e48442.
doi: 10.1371/journal.pone.0048442. Epub 2012 Oct 31.

PairMotif: A new pattern-driven algorithm for planted (l, d) DNA motif search

Affiliations

PairMotif: A new pattern-driven algorithm for planted (l, d) DNA motif search

Qiang Yu et al. PLoS One. 2012.

Abstract

Motif search is a fundamental problem in bioinformatics with an important application in locating transcription factor binding sites (TFBSs) in DNA sequences. The exact algorithms can report all (l, d) motifs and find the best one under a specific objective function. However, it is still a challenging task to identify weak motifs, since either a large amount of memory or execution time is required by current exact algorithms. A new exact algorithm, PairMotif, is proposed for planted (l, d) motif search (PMS) in this paper. To effectively reduce both candidate motifs and scanned l-mers, multiple pairs of l-mers with relatively large distances are selected from input sequences to restrict the search space. Comparisons with several recently proposed algorithms show that PairMotif requires less storage space and runs faster on most PMS instances. Particularly, among the algorithms compared, only PairMotif can solve the weak instance (27, 9) within 10 hours. Moreover, the performance of PairMotif is stable over the sequence length, which allows it to identify motifs in longer sequences. For the real biological data, experimental results demonstrate the validity of the proposed algorithm.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. An example for partitioning positions in the alignment of two/three l-mers.
This figure shows an example for partitioning positions in the alignment of two/three 15-mers.
Figure 2
Figure 2. Illustration of the PairMotif algorithm.
This figure takes the instance (15, 4) as an example to explain the process of PairMotif, which consists of three stages: selecting pairs, filtering l-mers and verifying candidate motifs.
Figure 3
Figure 3. An example for traversing candidate motifs in Md(x, x′).
This figure shows an example for traversing candidate motifs shared by two l-mers x and x′. After calculating R(x, x′), for each <α, β> in R(x, x′), let y = x′, and the process of traversing is implemented by changing y with three steps. First, select α positions from the positions where x[i] = x′ [i], and for each i of these α positions, change y[i] to one of the three characters different from x[i]. Second, select β positions from the positions where x[i] ≠x′ [i], and for each i of these β positions, change y[i] to one of the two characters different from x[i] and x′ [i]. Third, select a part of positions from the positions where x[i] ≠x′ [i] except for those selected in step 2, and change y[i] to x[i] for each i of these positions. The bold italic characters denote the changed positions in y.
Figure 4
Figure 4. Time comparison on different sequence lengths.
This figure compares PairMotif with two famous algorithms PMS5 and PMSprune on different sequence lengths on the instance (18, 6). The x-axis shows the sequence lengths. The y-axis shows the running times.
Figure 5
Figure 5. Comparison of predicted motifs under different objective functions.
The x-axis shows the data sets used in our experiments. For each data set, we obtain three predicted motifs in terms of three objective functions. The y-axis shows the value of nucleotide-level correlation coefficient for each predicted motif.

References

    1. Das MK, Dai HK (2007) A survey of DNA motif finding algorithms. BMC Bioinformatics 8. - PMC - PubMed
    1. Pevzner PA, Sze SH (2000) Combinatorial approaches to finding subtle signals in DNA sequences. In: Altman R, Bailey TL, eds. Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology. California: AAAI Press. 269–278. - PubMed
    1. Boucher C, Brown DG, Church P (2007) A graph clustering approach to weak motif recognition. In: Giancarlo R, Hannenhalli S, eds. Proceedings of the 7th International Workshop on Algorithms in Bioinformatics. Philadelphia: LNCS. 149–160.
    1. Lawrence CE, Altschul SF, Boguski MS, Liu JS, Neuwald AF, et al. (1993) Detecting subtle sequence signals: a Gibb’s sampling strategy for multiple alignment. Science 262: 208–214. - PubMed
    1. Bailey TL, Elkan C (1994) Fitting a mixture model by expectation maximization to discover motifs in biopolymers. In: Altman R, Brutlag D, eds. Proceedings of the 2nd International Conference on Intelligent Systems for Molecular Biology. California: AAAI Press. 28–36. - PubMed

Publication types