Efficient methods for multiple sequence alignment with guaranteed error bounds
- PMID: 7680269
- DOI: 10.1007/BF02460299
Efficient methods for multiple sequence alignment with guaranteed error bounds
Abstract
Multiple string (sequence) alignment is a difficult and important problem in computational biology, where it is central in two related tasks: finding highly conserved subregions or embedded patterns of a set of biological sequences (strings of DNA, RNA or amino acids), and inferring the evolutionary history of a set of taxa from their associated biological sequences. Several precise measures have been proposed for evaluating the goodness of a multiple alignment, but no efficient methods are known which compute the optimal alignment for any of these measures in any but small cases. In this paper, we consider two previously proposed measures, and give two computationaly efficient multiple alignment methods (one for each measure) whose deviation from the optimal value is guaranteed to be less than a factor of two. This is the novel feature of thse methods. but the methods have additional virtues as well. For both methods, the guaranteed bounds are much smaller than two when the number of strings is small (1.33 for three strings of any length); for one of the methods we give a related randomized method which is much faster and which gives, with high probability, multiple alignments with fairly small error bounds; and for the other measure, the method given yields a non-obvious lower bound on the optimal alignment.
Similar articles
-
Lower bounds on multiple sequence alignment using exact 3-way alignment.BMC Bioinformatics. 2007 Apr 30;8:140. doi: 10.1186/1471-2105-8-140. BMC Bioinformatics. 2007. PMID: 17470273 Free PMC article.
-
Multiple Sequence Alignment Based on a Suffix Tree and Center-Star Strategy: A Linear Method for Multiple Nucleotide Sequence Alignment on Spark Parallel Framework.J Comput Biol. 2017 Dec;24(12):1230-1242. doi: 10.1089/cmb.2017.0040. Epub 2017 Nov 8. J Comput Biol. 2017. PMID: 29116822
-
An improved string composition method for sequence comparison.BMC Bioinformatics. 2008 May 28;9 Suppl 6(Suppl 6):S15. doi: 10.1186/1471-2105-9-S6-S15. BMC Bioinformatics. 2008. PMID: 18541050 Free PMC article.
-
Optimal sum-of-pairs multiple sequence alignment using incremental Carrillo and Lipman bounds.J Comput Biol. 2006 Apr;13(3):668-85. doi: 10.1089/cmb.2006.13.668. J Comput Biol. 2006. PMID: 16706718
-
Alignment methods: strategies, challenges, benchmarking, and comparative overview.Methods Mol Biol. 2012;855:203-35. doi: 10.1007/978-1-61779-582-4_7. Methods Mol Biol. 2012. PMID: 22407710 Review.
Cited by
-
Multiple structural alignment by secondary structures: algorithm and applications.Protein Sci. 2003 Nov;12(11):2492-507. doi: 10.1110/ps.03200603. Protein Sci. 2003. PMID: 14573862 Free PMC article.
-
A multiple-template approach to protein threading.Proteins. 2011 Jun;79(6):1930-9. doi: 10.1002/prot.23016. Epub 2011 Apr 4. Proteins. 2011. PMID: 21465564 Free PMC article.
-
A combinatorial optimization approach for diverse motif finding applications.Algorithms Mol Biol. 2006 Aug 17;1:13. doi: 10.1186/1748-7188-1-13. Algorithms Mol Biol. 2006. PMID: 16916460 Free PMC article.
-
Hidden Markov models of biological primary sequence information.Proc Natl Acad Sci U S A. 1994 Feb 1;91(3):1059-63. doi: 10.1073/pnas.91.3.1059. Proc Natl Acad Sci U S A. 1994. PMID: 8302831 Free PMC article.
-
Lower bounds on multiple sequence alignment using exact 3-way alignment.BMC Bioinformatics. 2007 Apr 30;8:140. doi: 10.1186/1471-2105-8-140. BMC Bioinformatics. 2007. PMID: 17470273 Free PMC article.