Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2009 Oct 12:10:329.
doi: 10.1186/1471-2105-10-329.

PLAST: parallel local alignment search tool for database comparison

Affiliations
Comparative Study

PLAST: parallel local alignment search tool for database comparison

Van Hoa Nguyen et al. BMC Bioinformatics. .

Abstract

Background: Sequence similarity searching is an important and challenging task in molecular biology and next-generation sequencing should further strengthen the need for faster algorithms to process such vast amounts of data. At the same time, the internal architecture of current microprocessors is tending towards more parallelism, leading to the use of chips with two, four and more cores integrated on the same die. The main purpose of this work was to design an effective algorithm to fit with the parallel capabilities of modern microprocessors.

Results: A parallel algorithm for comparing large genomic banks and targeting middle-range computers has been developed and implemented in PLAST software. The algorithm exploits two key parallel features of existing and future microprocessors: the SIMD programming model (SSE instruction set) and the multithreading concept (multicore). Compared to multithreaded BLAST software, tests performed on an 8-processor server have shown speedup ranging from 3 to 6 with a similar level of accuracy.

Conclusion: A parallel algorithmic approach driven by the knowledge of the internal microprocessor architecture allows significant speedup to be obtained while preserving standard sensitivity for similarity search problems.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Bank indexing. Fragment of indexing scheme. For each seed key, a list of relative occurrence positions is stored on short integers.
Figure 2
Figure 2
Subsequence block. Fragment of subsequence block. For each seed key, a list of subsequences is constructed. Each subsequence contains a seed and its right and left neighborhood.
Figure 3
Figure 3
pseudocode for ungapped extension. The pseudocode of ungapped extension procedure for 2 blocks of subsequences. Sixteen extensions are simultaneously processed and a score is stored on an 8-bit unsigned byte integer.
Figure 4
Figure 4
pseudocode for small gapped extension. The pseudocode of small gapped extension procedure. Eight extensions are simultaneously processed and a score is stored on a 16-bit signed short integer.
Figure 5
Figure 5
ROC curve. (A) The ROC curves for the SCOP/ASTRAL40 data set of PLASTP and BLASTP. (B) The ROC curves for the Yeast data set of TPLASTN and TBLASTN. The ROC10000 score in (A) and ROC250 score in (B) for each program are shown in parentheses after the program name.
Figure 6
Figure 6
Coverage versus error plot. (A) The coverage versus error plots for the SCOP/ASTRAL40 data set of PLASTP and BLASTP. (B) The coverage versus error plots for the Yeast data set of TPLASTN and TBLASTN.
Figure 7
Figure 7
Speedup of the three PLAST programs. Speedup of the three PLAST programs relative to the number of threads and data sets. The E-value cutoff is set to 10-3.
Figure 8
Figure 8
PLAST profile. The profiles of the three PLAST programs, with and without SSE instructions. Each PLAST program was run with its specific data set: (A) PLASTP; (B) TPLASTN; (C) PLASTX.

Similar articles

Cited by

References

    1. Pop M, Salzberg SL. Bioinformatics challenges of new sequencing technology. Trends in Genetics. 2008;24:142–149. - PMC - PubMed
    1. Smith TF, Waterman MS. Identification of common molecular subsequences. Journal of Molecular Biology. 1981;147:195–197. doi: 10.1016/0022-2836(81)90087-5. - DOI - PubMed
    1. Rognes T, Seeberg E. Six-fold speed-up of Smith-Waterman sequence database searches using parallel processing on common microprocessors. Bioinformatics. 2000;16:699–706. doi: 10.1093/bioinformatics/16.8.699. - DOI - PubMed
    1. Farrar M. Striped Smith-Waterman speeds database searches six times over other SIMD implementations. Bioinformatics. 2007;23:156–161. doi: 10.1093/bioinformatics/btl582. - DOI - PubMed
    1. Pearson WR. Searching protein sequence libraries: comparison of the sensitivity and selectivity of the Smith-Waterman and FASTA algorithms. Genomics. 1991;11:635–650. doi: 10.1016/0888-7543(91)90071-L. - DOI - PubMed

Publication types