Improvement in the accuracy of multiple sequence alignment program MAFFT
- PMID: 16362903
Improvement in the accuracy of multiple sequence alignment program MAFFT
Abstract
In 2002, we developed and released a rapid multiple sequence alignment program MAFFT that was designed to handle a huge (up to approximately 5,000 sequences) and long data (approximately 2,000 aa or approximately 5,000 nt) in a reasonable time on a standard desktop PC. As for the accuracy, however, the previous versions (v.4 and lower) of MAFFT were outperformed by ProbCons and TCoffee v.2, both of which were released in 2004, in several benchmark tests. Here we report a recent extension of MAFFT that aims to improve the accuracy with as little cost of calculation time as possible. The extended version of MAFFT (v.5) has new iterative refinement options, G-INS-i and L-INS-i (collectively denoted as [GL]-INS-i in this report). These options use a new objective function combining the weighted sum-of-pairs (WSP) score and a score similar to COFFEE derived from all pairwise alignments. We discuss the improvement in accuracy brought by this extension, mainly using two benchmark tests released very recently, BAliBASE v.3 (for protein alignments) and BRAliBASE (for RNA alignments). According to BAliBASE v.3, the overall average accuracy of L-INS-i was higher than those of other methods successively released in 2004, although the difference among the most accurate methods (ProbCons, TCoffee v.2 and new options of MAFFT) was small. The advantage in accuracy of [GL]-INS-i became greater for the alignments consisting of approximately 50-100 sequences. By utilizing this feature of MAFFT, we also examined another possible approach to improve the accuracy by incorporating homolog information collected from database. The [GL]-INS-i options are applicable to aligning up to approximately 200 sequences, although not applicable to thousands of sequences because of time and space complexities.
Similar articles
-
Improvement in accuracy of multiple sequence alignment using novel group-to-group sequence alignment algorithm with piecewise linear gap cost.BMC Bioinformatics. 2006 Dec 1;7:524. doi: 10.1186/1471-2105-7-524. BMC Bioinformatics. 2006. PMID: 17137519 Free PMC article.
-
MAFFT version 5: improvement in accuracy of multiple sequence alignment.Nucleic Acids Res. 2005 Jan 20;33(2):511-8. doi: 10.1093/nar/gki198. Print 2005. Nucleic Acids Res. 2005. PMID: 15661851 Free PMC article.
-
Probalign: multiple sequence alignment using partition function posterior probabilities.Bioinformatics. 2006 Nov 15;22(22):2715-21. doi: 10.1093/bioinformatics/btl472. Epub 2006 Sep 5. Bioinformatics. 2006. PMID: 16954142
-
Multiple sequence alignment.Curr Opin Struct Biol. 2006 Jun;16(3):368-73. doi: 10.1016/j.sbi.2006.04.004. Epub 2006 May 5. Curr Opin Struct Biol. 2006. PMID: 16679011 Review.
-
[Getting the sequence world: How to use multiple alignment software].Tanpakushitsu Kakusan Koso. 2001 Jul;46(9):1299-305. Tanpakushitsu Kakusan Koso. 2001. PMID: 11552696 Review. Japanese. No abstract available.
Cited by
-
MergeAlign: improving multiple sequence alignment performance by dynamic reconstruction of consensus multiple sequence alignments.BMC Bioinformatics. 2012 May 30;13:117. doi: 10.1186/1471-2105-13-117. BMC Bioinformatics. 2012. PMID: 22646090 Free PMC article.
-
What Role Might Non-Mating Receptors Play in Schizophyllum commune?J Fungi (Basel). 2021 May 20;7(5):399. doi: 10.3390/jof7050399. J Fungi (Basel). 2021. PMID: 34065484 Free PMC article.
-
Transcriptome-wide identification and characterization of CAD isoforms specific for podophyllotoxin biosynthesis from Podophyllum hexandrum.Plant Mol Biol. 2016 Sep;92(1-2):1-23. doi: 10.1007/s11103-016-0492-5. Epub 2016 Jul 7. Plant Mol Biol. 2016. PMID: 27387305
-
Phylogenomics-based reconstruction of protozoan species tree.Evol Bioinform Online. 2011;7:107-21. doi: 10.4137/EBO.S6861. Epub 2011 Jul 31. Evol Bioinform Online. 2011. PMID: 21863127 Free PMC article.
-
R-PASS: A Fast Structure-based RNA Sequence Alignment Algorithm.Proceedings (IEEE Int Conf Bioinformatics Biomed). 2011 Dec 31;2011:618-622. doi: 10.1109/BIBM.2011.74. Proceedings (IEEE Int Conf Bioinformatics Biomed). 2011. PMID: 24772375 Free PMC article.
MeSH terms
LinkOut - more resources
Other Literature Sources
Research Materials