. 2011 Dec 7:12:466.

doi: 10.1186/1471-2105-12-466.

Accelerated large-scale multiple sequence alignment

Scott Lloyd¹, Quinn O Snell

Affiliations

PMID: 22151470
PMCID: PMC3310909
DOI: 10.1186/1471-2105-12-466

Accelerated large-scale multiple sequence alignment

Scott Lloyd et al. BMC Bioinformatics. 2011.

. 2011 Dec 7:12:466.

doi: 10.1186/1471-2105-12-466.

Authors

Scott Lloyd¹, Quinn O Snell

Affiliation

¹ Computer Science Department, Brigham Young University, Provo, UT 84602, USA. gscott@byu.edu

PMID: 22151470
PMCID: PMC3310909
DOI: 10.1186/1471-2105-12-466

Abstract

Background: Multiple sequence alignment (MSA) is a fundamental analysis method used in bioinformatics and many comparative genomic applications. Prior MSA acceleration attempts with reconfigurable computing have only addressed the first stage of progressive alignment and consequently exhibit performance limitations according to Amdahl's Law. This work is the first known to accelerate the third stage of progressive alignment on reconfigurable hardware.

Results: We reduce subgroups of aligned sequences into discrete profiles before they are pairwise aligned on the accelerator. Using an FPGA accelerator, an overall speedup of up to 150 has been demonstrated on a large data set when compared to a 2.4 GHz Core2 processor.

Conclusions: Our parallel algorithm and architecture accelerates large-scale MSA with reconfigurable computing and allows researchers to solve the larger problems that confront biologists today. Program source is available from http://dna.cs.byu.edu/msa/.

PubMed Disclaimer

Figures

**Figure 1**
**Example multiple alignment and derived profile**. Each position in a profile consists of a vector with character frequencies *f_N*for the corresponding column in a group of aligned sequences. (a) Multiple alignment of sequences *s_i*. (b) Profile derived from the alignment.

**Figure 2**
**Profile space**. In three dimensions, profile space is a triangle on the plane x + y + z = 1; however, five dimensions are required to represent DNA alignments. Points in profile space are shown with coordinates and an aligned column example (transposed). The corners of profile space represent columns of an alignment that contain all the same character.

**Figure 3**
**Sample point determination**. Sample points are determined by projecting lattice points onto the profile plane.

**Figure 4**
**Planes parallel to profile space**. Planes parallel to profile space are separated by a distance of $ε = \sqrt{D} ∕ D L$ . For this example, D = 3 and L = 4.

**Figure 5**
**Profile reduction before alignment**.

**Figure 6**
**Example profile calculation and reduction for sequences 1 and 2**. From the alignment {s₁, s₂}, a continuous profile is derived and then reduced to form the corresponding discrete profile p_1,2. S is a table of sample points.

**Figure 7**
**Example profile calculation and reduction for sequences 3 and 4**. From the alignment {s₃, s₄}, a continuous profile is derived and then reduced to form the corresponding discrete profile p_3,4. S is a table of sample points.

**Figure 8**
**Near neighbors in profile space**. Given two profile points, nearby sample points and associated symbol codes are shown.

**Figure 9**
**Example profile alignment**. A pairwise alignment algorithm treats discrete profiles as sequences. The resulting edit operations E_1,2,3,4indicate the computed alignment between the discrete profiles p_1,2and p_3,4, and the corresponding groups of sequences {s₁, s₂} and {s₃, s₄}.

**Figure 10**
**Alignment quality on the BRAliBase data set**. MUDISC (the new method) is compared with several alignment programs on a seven (k7) and fifteen (k15) sequence RNA reference set from BRAliBase 2.1. A higher score indicates better quality and is shown in relation to the average pairwise sequence identity (APSI). MUDISC uses discrete profile alignment.

**Figure 11**
**Alignment quality on the MDSA data set**. MUDISC (the new method) is compared with several alignment programs on the MDSA data set which contains nucleotide adaptations of the BAliBASE and SMART reference alignments. BAliBASE includes reference sets 1-7. MUDISC uses discrete profile alignment.

**Figure 12**
**Alignment run time comparison with stages**. Overall program runtimes are shown on the Influenza and HIV data sets with a breakdown of time spent in each stage.

See this image and copyright information in PMC

Cited by

Fast noisy long read alignment with multi-level parallelism.
Xia Z, Yang C, Peng C, Guo Y, Guo Y, Tang T, Cui Y. Xia Z, et al. BMC Bioinformatics. 2025 May 2;26(1):118. doi: 10.1186/s12859-025-06129-w. BMC Bioinformatics. 2025. PMID: 40316905 Free PMC article.
Characterization of the T-cell receptor beta chain repertoire in tumor-infiltrating lymphocytes.
Nakanishi K, Kukita Y, Segawa H, Inoue N, Ohue M, Kato K. Nakanishi K, et al. Cancer Med. 2016 Sep;5(9):2513-21. doi: 10.1002/cam4.828. Epub 2016 Jul 27. Cancer Med. 2016. PMID: 27465739 Free PMC article.

References

1. Feng DF, Doolittle RF. Progressive Sequence Alignment as a Prerequisite to Correct Phylogenetic Trees. Journal of Molecular Evolution. 1987;25(4):351–360. doi: 10.1007/BF02603120. - DOI - PubMed
1. Notredame C, Higgins DG, Heringa J. T-Coffee: A Novel Method for Fast and Accurate Multiple Sequence Alignment. Journal of Molecular Biology. 2000;302:205–217. doi: 10.1006/jmbi.2000.4042. - DOI - PubMed
1. Thompson JD, Higgins DG, Gibson TJ. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Research. 1994;22(22):4673–4680. doi: 10.1093/nar/22.22.4673. - DOI - PMC - PubMed
1. Lloyd S, Snell QO. Hardware Accelerated Sequence Alignment with Traceback. International Journal of Reconfigurable Computing. 2009;2009:10. [Article ID 762362]
1. Wilm A, Mainz I, Steger G. An enhanced RNA alignment benchmark for sequence alignment programs. Algorithms for Molecular Biology. 2006;1:19. doi: 10.1186/1748-7188-1-19. - DOI - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions
Actions

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Accelerated large-scale multiple sequence alignment

Affiliation

Accelerated large-scale multiple sequence alignment

Authors

Affiliation

Abstract

Figures

Similar articles

Cited by

References

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Abstract

Figures

Similar articles

Cited by

References

MeSH terms

Substances

Related information

LinkOut - more resources

Full Text Sources