Combining partial order alignment and progressive multiple sequence alignment increases alignment speed and scalability to very large alignment problems
- PMID: 14962922
- DOI: 10.1093/bioinformatics/bth126
Combining partial order alignment and progressive multiple sequence alignment increases alignment speed and scalability to very large alignment problems
Abstract
Motivation: Partial order alignment (POA) has been proposed as a new approach to multiple sequence alignment (MSA), which can be combined with existing methods such as progressive alignment. This is important for addressing problems both in the original version of POA (such as order sensitivity) and in standard progressive alignment programs (such as information loss in complex alignments, especially surrounding gap regions).
Results: We have developed a new Partial Order-Partial Order alignment algorithm that optimally aligns a pair of MSAs and which therefore can be applied directly to progressive alignment methods such as CLUSTAL. Using this algorithm, we show the combined Progressive POA alignment method yields results comparable with the best available MSA programs (CLUSTALW, DIALIGN2, T-COFFEE) but is far faster. For example, depending on the level of sequence similarity, aligning 1000 sequences, each 500 amino acids long, took 15 min (at 90% average identity) to 44 min (at 30% identity) on a standard PC. For large alignments, Progressive POA was 10-30 times faster than the fastest of the three previous methods (CLUSTALW). These data suggest that POA-based methods can scale to much larger alignment problems than possible for previous methods.
Availability: The POA source code is available at http://www.bioinformatics.ucla.edu/poa
Similar articles
-
Multiple sequence alignment using partial order graphs.Bioinformatics. 2002 Mar;18(3):452-64. doi: 10.1093/bioinformatics/18.3.452. Bioinformatics. 2002. PMID: 11934745
-
POAVIZ: a Partial order multiple sequence alignment visualizer.Bioinformatics. 2003 Jul 22;19(11):1446-8. doi: 10.1093/bioinformatics/btg175. Bioinformatics. 2003. PMID: 12874062
-
Grammar-based distance in progressive multiple sequence alignment.BMC Bioinformatics. 2008 Jul 10;9:306. doi: 10.1186/1471-2105-9-306. BMC Bioinformatics. 2008. PMID: 18616828 Free PMC article.
-
SPEM: improving multiple sequence alignment with sequence profiles and predicted secondary structures.Bioinformatics. 2005 Sep 15;21(18):3615-21. doi: 10.1093/bioinformatics/bti582. Epub 2005 Jul 14. Bioinformatics. 2005. PMID: 16020471
-
Making automated multiple alignments of very large numbers of protein sequences.Bioinformatics. 2013 Apr 15;29(8):989-95. doi: 10.1093/bioinformatics/btt093. Epub 2013 Feb 21. Bioinformatics. 2013. PMID: 23428640
Cited by
-
Sequencing and comparative genomic analysis of 1227 Felis catus cDNA sequences enriched for developmental, clinical and nutritional phenotypes.BMC Genomics. 2012 Jan 18;13:31. doi: 10.1186/1471-2164-13-31. BMC Genomics. 2012. PMID: 22257742 Free PMC article.
-
M-GCAT: interactively and efficiently constructing large-scale multiple genome comparison frameworks in closely related species.BMC Bioinformatics. 2006 Oct 5;7:433. doi: 10.1186/1471-2105-7-433. BMC Bioinformatics. 2006. PMID: 17022809 Free PMC article.
-
Comparative analysis of Panicum streak virus and Maize streak virus diversity, recombination patterns and phylogeography.Virol J. 2009 Nov 10;6:194. doi: 10.1186/1743-422X-6-194. Virol J. 2009. PMID: 19903330 Free PMC article.
-
The Adaptive Evolution Database (TAED): a phylogeny based tool for comparative genomics.Nucleic Acids Res. 2005 Jan 1;33(Database issue):D495-7. doi: 10.1093/nar/gki090. Nucleic Acids Res. 2005. PMID: 15608245 Free PMC article.
-
SAHG, a comprehensive database of predicted structures of all human proteins.Nucleic Acids Res. 2011 Jan;39(Database issue):D487-93. doi: 10.1093/nar/gkq1057. Epub 2010 Nov 3. Nucleic Acids Res. 2011. PMID: 21051360 Free PMC article.
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
Miscellaneous