. 2010 Mar 16:11:132.

doi: 10.1186/1471-2105-11-132.

A parallel and incremental algorithm for efficient unique signature discovery on DNA databases

Hsiao Ping Lee¹, Tzu-Fang Sheu, Chuan Yi Tang

Affiliations

PMID: 20230647
PMCID: PMC2848650
DOI: 10.1186/1471-2105-11-132

A parallel and incremental algorithm for efficient unique signature discovery on DNA databases

Hsiao Ping Lee et al. BMC Bioinformatics. 2010.

. 2010 Mar 16:11:132.

doi: 10.1186/1471-2105-11-132.

Authors

Hsiao Ping Lee¹, Tzu-Fang Sheu, Chuan Yi Tang

Affiliation

¹ Department of Computer Science and Communication Engineering, Providence University, Taichung, 43301 Taiwan, ROC.

PMID: 20230647
PMCID: PMC2848650
DOI: 10.1186/1471-2105-11-132

Abstract

Background: DNA signatures are distinct short nucleotide sequences that provide valuable information that is used for various purposes, such as the design of Polymerase Chain Reaction primers and microarray experiments. Biologists usually use a discovery algorithm to find unique signatures from DNA databases, and then apply the signatures to microarray experiments. Such discovery algorithms require to set some input factors, such as signature length l and mismatch tolerance d, which affect the discovery results. However, suggestions about how to select proper factor values are rare, especially when an unfamiliar DNA database is used. In most cases, biologists typically select factor values based on experience, or even by guessing. If the discovered result is unsatisfactory, biologists change the input factors of the algorithm to obtain a new result. This process is repeated until a proper result is obtained. Implicit signatures under the discovery condition (l, d) are defined as the signatures of length < or = l with mismatch tolerance > or = d. A discovery algorithm that could discover all implicit signatures, such that those that meet the requirements concerning the results, would be more helpful than one that depends on trial and error. However, existing discovery algorithms do not address the need to discover all implicit signatures.

Results: This work proposes two discovery algorithms - the consecutive multiple discovery (CMD) algorithm and the parallel and incremental signature discovery (PISD) algorithm. The PISD algorithm is designed for efficiently discovering signatures under a certain discovery condition. The algorithm finds new results by using previously discovered results as candidates, rather than by using the whole database. The PISD algorithm further increases discovery efficiency by applying parallel computing. The CMD algorithm is designed to discover implicit signatures efficiently. It uses the PISD algorithm as a kernel routine to discover implicit signatures efficiently under every feasible discovery condition.

Conclusions: The proposed algorithms discover implicit signatures efficiently. The presented CMD algorithm has up to 97% less execution time than typical sequential discovery algorithms in the discovery of implicit signatures in experiments, when eight processing cores are used.

PubMed Disclaimer

Figures

**Figure 1**
**An example of the first potential problem of parallel signature discovery**. The tasks can be completed in 22 time units by a single-processor computer. The overall processing time is 15 units for a two-processor computer, which exceeds the optimal processing time, 11 units.

**Figure 2**
**The result of moving long tasks forward in the processing order list**. The long tasks are moved forward in the processing order list in Figure 2, yielding the new processing order list. The overall processing time for a two-processor computer is 11 units, which equals the optimal processing time.

**Figure 3**
**An example of the second potential problem of parallel signature discovery**. The tasks can be completed in 24 units by a single-processor computer. The overall processing time is 19 units for a two-processor computer, which exceeds the optimal processing time, 12 units.

**Figure 4**
**The result of moving long tasks forward in the processing order list**. The long tasks are moved forward in the processing order list in Figure 4, yielding the new processing order list. The overall processing time for a two-processor computer is 16 units, which still exceeds the optimal processing time.

**Figure 5**
**The result of dividing long tasks into short tasks in the processing order list**. Task F in Figure 5 is divided into two tasks with identical processing times, yielding the new task list. The overall processing time for a two-processor computer is 12 units, which equals the optimal processing time.

See this image and copyright information in PMC

Cited by

HTSFinder: Powerful Pipeline of DNA Signature Discovery by Parallel and Distributed Computing.
Karimi R, Hajdu A. Karimi R, et al. Evol Bioinform Online. 2016 Feb 10;12:73-85. doi: 10.4137/EBO.S35545. eCollection 2016. Evol Bioinform Online. 2016. PMID: 26884678 Free PMC article.
An algorithm of discovering signatures from DNA databases on a computer cluster.
Lee HP, Sheu TF. Lee HP, et al. BMC Bioinformatics. 2014 Oct 5;15(1):339. doi: 10.1186/1471-2105-15-339. BMC Bioinformatics. 2014. PMID: 25282047 Free PMC article.
A method for automatically extracting infectious disease-related primers and probes from the literature.
García-Remesal M, Cuevas A, López-Alonso V, López-Campos G, de la Calle G, de la Iglesia D, Pérez-Rey D, Crespo J, Martín-Sánchez F, Maojo V. García-Remesal M, et al. BMC Bioinformatics. 2010 Aug 3;11:410. doi: 10.1186/1471-2105-11-410. BMC Bioinformatics. 2010. PMID: 20682041 Free PMC article.
Conserved PCR primer set designing for closely-related species to complete mitochondrial genome sequencing using a sliding window-based PSO algorithm.
Yang CH, Chang HW, Ho CH, Chou YC, Chuang LY. Yang CH, et al. PLoS One. 2011 Mar 18;6(3):e17729. doi: 10.1371/journal.pone.0017729. PLoS One. 2011. PMID: 21445268 Free PMC article.
Cluster oligonucleotide signatures for rapid identification by sequencing.
Zahariev M, Chen W, Visagie CM, Lévesque CA. Zahariev M, et al. BMC Bioinformatics. 2018 Oct 29;19(1):395. doi: 10.1186/s12859-018-2363-3. BMC Bioinformatics. 2018. PMID: 30522439 Free PMC article.

References

1. Kaderali L, Schliep A. Selecting signature oligonucleotides to identify organisms using DNA arrays. Bioinformatics. 2002;18(10):1340–1349. doi: 10.1093/bioinformatics/18.10.1340. - DOI - PubMed
1. Francois P, Charbonnier Y, Jacquet J, Utinger D, Bento M, Lew D, Kresbach GM, Ehrat M, Schlegel W, Schrenzel J. Rapid bacterial identification using evanescent-waveguide oligonucleotide microarray classification. Journal of Microbiological Methods. 2006;65(3):390–403. doi: 10.1016/j.mimet.2005.08.012. - DOI - PubMed
1. Mateo-Marti E, Briones C, Pradier CM, Martin-Gago JA. A DNA biosensor based on peptide nucleic acids on gold surfaces. Biosensors and Bioelectronics. 2007;22(9-10):1926–1932. doi: 10.1016/j.bios.2006.08.012. - DOI - PubMed
1. Eom HS, Hwang BH, Kim DH, Lee IB, Kim YH, Cha HJ. Multiple detection of food-borne pathogenic bacteria using a novel 16S rDNA-based oligonucleotide signature chip. Biosensors and Bioelectronics. 2007;22(6):845–853. doi: 10.1016/j.bios.2006.03.005. - DOI - PubMed
1. Kiryu BM, Kiryu CP. Rapid identification of Candida albicans and other human pathogenic yeasts by using oligonucleotides in a PCR. J Clin Microbiol. 1998;73:1634–1641. - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

A parallel and incremental algorithm for efficient unique signature discovery on DNA databases

Affiliation

A parallel and incremental algorithm for efficient unique signature discovery on DNA databases

Authors

Affiliation

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Related information

LinkOut - more resources

Full Text Sources