Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Jun 1:12:221.
doi: 10.1186/1471-2105-12-221.

Faster Smith-Waterman database searches with inter-sequence SIMD parallelisation

Affiliations

Faster Smith-Waterman database searches with inter-sequence SIMD parallelisation

Torbjørn Rognes. BMC Bioinformatics. .

Abstract

Background: The Smith-Waterman algorithm for local sequence alignment is more sensitive than heuristic methods for database searching, but also more time-consuming. The fastest approach to parallelisation with SIMD technology has previously been described by Farrar in 2007. The aim of this study was to explore whether further speed could be gained by other approaches to parallelisation.

Results: A faster approach and implementation is described and benchmarked. In the new tool SWIPE, residues from sixteen different database sequences are compared in parallel to one query residue. Using a 375 residue query sequence a speed of 106 billion cell updates per second (GCUPS) was achieved on a dual Intel Xeon X5650 six-core processor system, which is over six times more rapid than software based on Farrar's 'striped' approach. SWIPE was about 2.5 times faster when the programs used only a single thread. For shorter queries, the increase in speed was larger. SWIPE was about twice as fast as BLAST when using the BLOSUM50 score matrix, while BLAST was about twice as fast as SWIPE for the BLOSUM62 matrix. The software is designed for 64 bit Linux on processors with SSSE3. Source code is available from http://dna.uio.no/swipe/ under the GNU Affero General Public License.

Conclusions: Efficient parallelisation using SIMD on standard hardware makes it possible to run Smith-Waterman database searches more than six times faster than before. The approach described here could significantly widen the potential application of Smith-Waterman searches. Other applications that require optimal local alignment scores could also benefit from improved performance.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Approaches to vectorisation of Smith-Waterman alignments. Alignment matrices are shown with the elements that form the first five vectors processed indicated in black, blue, red, green and yellow. For simplicity, vectors of only 4 elements are shown, while 16 elements would normally be used. (A) Vectors along the anti-diagonal, described by Wozniak et al[8]. (B) Vectors along the query, described by Rognes and Seeberg [9]. (C) Striped approach, described by Farrar [10]. (D) Multi-sequence vectors, described by Alpern et al[6]. and in this paper.
Figure 2
Figure 2
Blocks of database residues processed together. The residues of the first five vectors processed are indicated on grey, blue, red, green and yellow background. Four symbols from each of the database sequences form blocks that are processed as a group. Padding of some sequence blocks are indicated with dashes on a pink background. An additional database sequence change is indicated with a pink vertical bar. Arrows below the sequences indicate positions were new sequences begin.
Figure 3
Figure 3
Core instructions. These are the ten core instructions executed for each vector of cells in the alignment matrices. In this code, the first (left) operand is the source and the second (right) operand is the destination.
Figure 4
Figure 4
Block of cells computed in each iteration. In each iteration of the inner loop blocks of eight cells in the alignment matrices computed. The H, E, F and S values are updated for each cell. The computations start in the upper left cell, proceed right three times and then continue from the left on the second row.
Figure 5
Figure 5
Creation of a temporary score profile for 64 database sequence residues. A standard score matrix and 4 residues from 16 different database sequences form the basis for a kind of temporary partial database sequences score profile. The score profile enables rapid comparison of any query residue with these 64 database residues.
Figure 6
Figure 6
Performance dependency on number of threads and query length. The speed in billion cell updates per second (GCUPS) of the BLAST (red), BLAST+ (orange), SWIPE (black), and SWPS3 (green) programs, as well as STRIPED compiled with the GNU (light blue) and Intel (dark blue) compiler, using a variable number of threads and queries of varying length. (A) Number of threads ranging from 1 to 24 and the 375 residue long P07327 query sequence. (B) Query sequences ranging from 24 to 5478 residues in length and 24 threads. (C) Query sequences of varying length and 1 thread. All scales are logarithmic. The BLOSUM62 matrix and gap open and extension penalties of 11 and 1, respectively, were used in all cases.
Figure 7
Figure 7
Performance with different scoring systems. The speed in billion cell updates per second (GCUPS) (logarithmic scale) is shown for the BLAST (red), BLAST+ (orange), SWIPE (black), and SWPS3 (green) programs, as well as STRIPED compiled with the GNU (light blue) and Intel (dark blue) compiler, using different scoring systems. All combinations of scoring matrices and gap penalties accepted by BLAST were tested. The matrix name is indicated above each graph, while the open and extension gap penalties are indicated on the x-axis. The query sequence was P07327 and 24 threads were running. SWPS3 would not run successfully in all cases.

References

    1. Smith TF, Waterman MS. Identification of common molecular subsequences. J Mol Biol. 1981;147:195–197. doi: 10.1016/0022-2836(81)90087-5. - DOI - PubMed
    1. Gotoh O. An improved algorithm for matching biological sequences. J Mol Biol. 1982;162:705–708. doi: 10.1016/0022-2836(82)90398-9. - DOI - PubMed
    1. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol. 1990;215:403–410. - PubMed
    1. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25:3389–3402. doi: 10.1093/nar/25.17.3389. - DOI - PMC - PubMed
    1. Li ITS, Shum W, Truong K. 160-fold acceleration of the Smith-Waterman algorithm using a field programmable gate array (FPGA) BMC Bioinformatics. 2007;8:185. doi: 10.1186/1471-2105-8-185. - DOI - PMC - PubMed

Publication types