DIALIGN-TX: greedy and progressive approaches for segment-based multiple sequence alignment

Amarendran R Subramanian¹, Michael Kaufmann, Burkhard Morgenstern

Affiliations

PMID: 18505568
PMCID: PMC2430965
DOI: 10.1186/1748-7188-3-6

DIALIGN-TX: greedy and progressive approaches for segment-based multiple sequence alignment

Amarendran R Subramanian et al. Algorithms Mol Biol. 2008.

. 2008 May 27:3:6.

doi: 10.1186/1748-7188-3-6.

Authors

Amarendran R Subramanian¹, Michael Kaufmann, Burkhard Morgenstern

Affiliation

¹ University of Tübingen, Wilhelm-Schickard-Institut für Informatik, Sand 13, 72076 Tübingen, Germany. subraman@informatik.uni-tuebingen.de

PMID: 18505568
PMCID: PMC2430965
DOI: 10.1186/1748-7188-3-6

Abstract

Background: DIALIGN-T is a reimplementation of the multiple-alignment program DIALIGN. Due to several algorithmic improvements, it produces significantly better alignments on locally and globally related sequence sets than previous versions of DIALIGN. However, like the original implementation of the program, DIALIGN-T uses a a straight-forward greedy approach to assemble multiple alignments from local pairwise sequence similarities. Such greedy approaches may be vulnerable to spurious random similarities and can therefore lead to suboptimal results. In this paper, we present DIALIGN-TX, a substantial improvement of DIALIGN-T that combines our previous greedy algorithm with a progressive alignment approach.

Results: Our new heuristic produces significantly better alignments, especially on globally related sequences, without increasing the CPU time and memory consumption exceedingly. The new method is based on a guide tree; to detect possible spurious sequence similarities, it employs a vertex-cover approximation on a conflict graph. We performed benchmarking tests on a large set of nucleic acid and protein sequences For protein benchmarks we used the benchmark database BALIBASE 3 and an updated release of the database IRMBASE 2 for assessing the quality on globally and locally related sequences, respectively. For alignment of nucleic acid sequences, we used BRAliBase II for global alignment and a newly developed database of locally related sequences called DIRM-BASE 1. IRMBASE 2 and DIRMBASE 1 are constructed by implanting highly conserved motives at random positions in long unalignable sequences.

Conclusion: On BALIBASE3, our new program performs significantly better than the previous program DIALIGN-T and outperforms the popular global aligner CLUSTAL W, though it is still outperformed by programs that focus on global alignment like MAFFT, MUSCLE and T-COFFEE. On the locally related test sets in IRMBASE 2 and DIRM-BASE 1, our method outperforms all other programs while MAFFT E-INSi is the only method that comes close to the performance of DIALIGN-TX.

PubMed Disclaimer

Figures

**Figure 1**
High-level description of our algorithm to calculate a multiple alignment of a set of input sequences s₁, . . ., s_k. The algorithm calculates a first alignment A₀using our novel *progressive* approach and a second alignment A₁with the greedy method previously used in DIALIGN. Finally, the alignment with the higher numerical score is returned. For the progressive method, *fragments*, i.e. local gap-free pairwise alignments from the respective optimal pairwise alignments are considered. Fragments with a weight score above the average fragment score are processed first following a *guide tree* as described in the main text. Lower-scoring fragments are added later, provided they are consistent with the previously included high-scoring fragments. Note that the output of the sub-routine *PAIRWISE_ALIGNMENT* is a chain of fragments. This is equivalent to a pairwise alignment in the sense of DIALIGN.

See this image and copyright information in PMC

References

1. Morgenstern B, Dress A, Werner T. Multiple DNA and protein sequence alignment based on segment-to-segment comparison. Proc Natl Acad Sci USA. 1996;93:12098–12103. doi: 10.1073/pnas.93.22.12098. - DOI - PMC - PubMed
1. Morgenstern B. DIALIGN: Multiple DNA and Protein Sequence Alignment at BiBiServ. Nuc Acids Res. 2003;33:W33–6. - PMC - PubMed
1. Altschul SF, Gish W, Miller W, Myers EM, Lipman DJ. Basic Local Alignment Search Tool. J Mol Biol. 1990;215:403–410. - PubMed
1. Karlin S, Altschul SF. Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. Proc Natl Acad Sci USA. 1990;87:2264–2268. doi: 10.1073/pnas.87.6.2264. - DOI - PMC - PubMed
1. Morgenstern B, Prohaska SJ, Pöhler D, Stadler PF. Multiple sequence alignment with user-defined anchor points. Algorithms for Molecular Biology. 2006;1:6. doi: 10.1186/1748-7188-1-6. - DOI - PMC - PubMed

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

DIALIGN-TX: greedy and progressive approaches for segment-based multiple sequence alignment

Affiliation

DIALIGN-TX: greedy and progressive approaches for segment-based multiple sequence alignment

Authors

Affiliation

Abstract

Figures

References

LinkOut - more resources

Full Text Sources

Other Literature Sources