PairMotif: A new pattern-driven algorithm for planted (l, d) DNA motif search

Qiang Yu¹, Hongwei Huo, Yipu Zhang, Hongzhi Guo

Affiliations

PMID: 23119020
PMCID: PMC3485246
DOI: 10.1371/journal.pone.0048442

PairMotif: A new pattern-driven algorithm for planted (l, d) DNA motif search

Qiang Yu et al. PLoS One. 2012.

. 2012;7(10):e48442.

doi: 10.1371/journal.pone.0048442. Epub 2012 Oct 31.

Authors

Qiang Yu¹, Hongwei Huo, Yipu Zhang, Hongzhi Guo

Affiliation

¹ School of Computer Science and Technology, Xidian University, Xi'an, Shaanxi, China.

PMID: 23119020
PMCID: PMC3485246
DOI: 10.1371/journal.pone.0048442

Abstract

Motif search is a fundamental problem in bioinformatics with an important application in locating transcription factor binding sites (TFBSs) in DNA sequences. The exact algorithms can report all (l, d) motifs and find the best one under a specific objective function. However, it is still a challenging task to identify weak motifs, since either a large amount of memory or execution time is required by current exact algorithms. A new exact algorithm, PairMotif, is proposed for planted (l, d) motif search (PMS) in this paper. To effectively reduce both candidate motifs and scanned l-mers, multiple pairs of l-mers with relatively large distances are selected from input sequences to restrict the search space. Comparisons with several recently proposed algorithms show that PairMotif requires less storage space and runs faster on most PMS instances. Particularly, among the algorithms compared, only PairMotif can solve the weak instance (27, 9) within 10 hours. Moreover, the performance of PairMotif is stable over the sequence length, which allows it to identify motifs in longer sequences. For the real biological data, experimental results demonstrate the validity of the proposed algorithm.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

**Figure 2. Illustration of the PairMotif algorithm.**
This figure takes the instance (15, 4) as an example to explain the process of PairMotif, which consists of three stages: selecting pairs, filtering l-mers and verifying candidate motifs.

**Figure 3. An example for traversing candidate motifs in *M_d*(x, x′).**
This figure shows an example for traversing candidate motifs shared by two l-mers x and x′. After calculating R(x, x′), for each <α, β> in R(x, x′), let y = x′, and the process of traversing is implemented by changing y with three steps. First, select α positions from the positions where x[i] = x′ [i], and for each i of these α positions, change y[i] to one of the three characters different from x[i]. Second, select β positions from the positions where x[i] ≠x′ [i], and for each i of these β positions, change y[i] to one of the two characters different from x[i] and x′ [i]. Third, select a part of positions from the positions where x[i] ≠x′ [i] except for those selected in step 2, and change y[i] to x[i] for each i of these positions. The bold italic characters denote the changed positions in y.

**Figure 4. Time comparison on different sequence lengths.**
This figure compares PairMotif with two famous algorithms PMS5 and PMSprune on different sequence lengths on the instance (18, 6). The x-axis shows the sequence lengths. The y-axis shows the running times.

**Figure 5. Comparison of predicted motifs under different objective functions.**
The x-axis shows the data sets used in our experiments. For each data set, we obtain three predicted motifs in terms of three objective functions. The y-axis shows the value of nucleotide-level correlation coefficient for each predicted motif.

See this image and copyright information in PMC

References

1. Das MK, Dai HK (2007) A survey of DNA motif finding algorithms. BMC Bioinformatics 8. - PMC - PubMed
1. Pevzner PA, Sze SH (2000) Combinatorial approaches to finding subtle signals in DNA sequences. In: Altman R, Bailey TL, eds. Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology. California: AAAI Press. 269–278. - PubMed
1. Boucher C, Brown DG, Church P (2007) A graph clustering approach to weak motif recognition. In: Giancarlo R, Hannenhalli S, eds. Proceedings of the 7th International Workshop on Algorithms in Bioinformatics. Philadelphia: LNCS. 149–160.
1. Lawrence CE, Altschul SF, Boguski MS, Liu JS, Neuwald AF, et al. (1993) Detecting subtle sequence signals: a Gibb’s sampling strategy for multiple alignment. Science 262: 208–214. - PubMed
1. Bailey TL, Elkan C (1994) Fitting a mixture model by expectation maximization to discover motifs in biopolymers. In: Altman R, Brutlag D, eds. Proceedings of the 2nd International Conference on Intelligent Systems for Molecular Biology. California: AAAI Press. 28–36. - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

PairMotif: A new pattern-driven algorithm for planted (l, d) DNA motif search

Affiliation

PairMotif: A new pattern-driven algorithm for planted (l, d) DNA motif search

Authors

Affiliation

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources