Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Dec 7:12:466.
doi: 10.1186/1471-2105-12-466.

Accelerated large-scale multiple sequence alignment

Affiliations

Accelerated large-scale multiple sequence alignment

Scott Lloyd et al. BMC Bioinformatics. .

Abstract

Background: Multiple sequence alignment (MSA) is a fundamental analysis method used in bioinformatics and many comparative genomic applications. Prior MSA acceleration attempts with reconfigurable computing have only addressed the first stage of progressive alignment and consequently exhibit performance limitations according to Amdahl's Law. This work is the first known to accelerate the third stage of progressive alignment on reconfigurable hardware.

Results: We reduce subgroups of aligned sequences into discrete profiles before they are pairwise aligned on the accelerator. Using an FPGA accelerator, an overall speedup of up to 150 has been demonstrated on a large data set when compared to a 2.4 GHz Core2 processor.

Conclusions: Our parallel algorithm and architecture accelerates large-scale MSA with reconfigurable computing and allows researchers to solve the larger problems that confront biologists today. Program source is available from http://dna.cs.byu.edu/msa/.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Example multiple alignment and derived profile. Each position in a profile consists of a vector with character frequencies fN for the corresponding column in a group of aligned sequences. (a) Multiple alignment of sequences si. (b) Profile derived from the alignment.
Figure 2
Figure 2
Profile space. In three dimensions, profile space is a triangle on the plane x + y + z = 1; however, five dimensions are required to represent DNA alignments. Points in profile space are shown with coordinates and an aligned column example (transposed). The corners of profile space represent columns of an alignment that contain all the same character.
Figure 3
Figure 3
Sample point determination. Sample points are determined by projecting lattice points onto the profile plane.
Figure 4
Figure 4
Planes parallel to profile space. Planes parallel to profile space are separated by a distance of ε=DDL. For this example, D = 3 and L = 4.
Figure 5
Figure 5
Profile reduction before alignment.
Figure 6
Figure 6
Example profile calculation and reduction for sequences 1 and 2. From the alignment {s1, s2}, a continuous profile is derived and then reduced to form the corresponding discrete profile p1,2. S is a table of sample points.
Figure 7
Figure 7
Example profile calculation and reduction for sequences 3 and 4. From the alignment {s3, s4}, a continuous profile is derived and then reduced to form the corresponding discrete profile p3,4. S is a table of sample points.
Figure 8
Figure 8
Near neighbors in profile space. Given two profile points, nearby sample points and associated symbol codes are shown.
Figure 9
Figure 9
Example profile alignment. A pairwise alignment algorithm treats discrete profiles as sequences. The resulting edit operations E1,2,3,4 indicate the computed alignment between the discrete profiles p1,2 and p3,4, and the corresponding groups of sequences {s1, s2} and {s3, s4}.
Figure 10
Figure 10
Alignment quality on the BRAliBase data set. MUDISC (the new method) is compared with several alignment programs on a seven (k7) and fifteen (k15) sequence RNA reference set from BRAliBase 2.1. A higher score indicates better quality and is shown in relation to the average pairwise sequence identity (APSI). MUDISC uses discrete profile alignment.
Figure 11
Figure 11
Alignment quality on the MDSA data set. MUDISC (the new method) is compared with several alignment programs on the MDSA data set which contains nucleotide adaptations of the BAliBASE and SMART reference alignments. BAliBASE includes reference sets 1-7. MUDISC uses discrete profile alignment.
Figure 12
Figure 12
Alignment run time comparison with stages. Overall program runtimes are shown on the Influenza and HIV data sets with a breakdown of time spent in each stage.

Similar articles

Cited by

References

    1. Feng DF, Doolittle RF. Progressive Sequence Alignment as a Prerequisite to Correct Phylogenetic Trees. Journal of Molecular Evolution. 1987;25(4):351–360. doi: 10.1007/BF02603120. - DOI - PubMed
    1. Notredame C, Higgins DG, Heringa J. T-Coffee: A Novel Method for Fast and Accurate Multiple Sequence Alignment. Journal of Molecular Biology. 2000;302:205–217. doi: 10.1006/jmbi.2000.4042. - DOI - PubMed
    1. Thompson JD, Higgins DG, Gibson TJ. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Research. 1994;22(22):4673–4680. doi: 10.1093/nar/22.22.4673. - DOI - PMC - PubMed
    1. Lloyd S, Snell QO. Hardware Accelerated Sequence Alignment with Traceback. International Journal of Reconfigurable Computing. 2009;2009:10. [Article ID 762362]
    1. Wilm A, Mainz I, Steger G. An enhanced RNA alignment benchmark for sequence alignment programs. Algorithms for Molecular Biology. 2006;1:19. doi: 10.1186/1748-7188-1-19. - DOI - PMC - PubMed

LinkOut - more resources