Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2005 Jan 20;33(2):511-8.
doi: 10.1093/nar/gki198. Print 2005.

MAFFT version 5: improvement in accuracy of multiple sequence alignment

Affiliations

MAFFT version 5: improvement in accuracy of multiple sequence alignment

Kazutaka Katoh et al. Nucleic Acids Res. .

Abstract

The accuracy of multiple sequence alignment program MAFFT has been improved. The new version (5.3) of MAFFT offers new iterative refinement options, H-INS-i, F-INS-i and G-INS-i, in which pairwise alignment information are incorporated into objective function. These new options of MAFFT showed higher accuracy than currently available methods including TCoffee version 2 and CLUSTAL W in benchmark tests consisting of alignments of >50 sequences. Like the previously available options, the new options of MAFFT can handle hundreds of sequences on a standard desktop computer. We also examined the effect of the number of homologues included in an alignment. For a multiple alignment consisting of approximately 8 sequences with low similarity, the accuracy was improved (2-10 percentage points) when the sequences were aligned together with dozens of their close homologues (E-value < 10(-5)-10(-20)) collected from a database. Such improvement was generally observed for most methods, but remarkably large for the new options of MAFFT proposed here. Thus, we made a Ruby script, mafftE.rb, which aligns the input sequences together with their close homologues collected from SwissProt using NCBI-BLAST.

PubMed Disclaimer

Figures

Figure 1
Figure 1
The CPU times required for various sizes of alignments. Sequences were generated using the ROSE program (29). (A and B) Average length (L) of input sequences versus CPU time. The number of sequence is 40. Average distance among input sequences is 100 PAM (A) (percentage identity ∼ 35–85) or 250 PAM (B) (percentage identity ∼ 15–65). (C and D) The number of input sequences (N) versus CPU time. Average sequence length is 300. Average distance among input sequences is 100 PAM (C) or 250 PAM (D). See Table 1 for command-line options for each strategy in MAFFT. Options of other programs are as follows:
  1. TCoffee, default;

  2. PROBCONS, default;

  3. CLUSTAL W, default;

  4. MUSCLE-i, muscle -maxiters 16;

  5. MUSCLE-2, muscle -maxiters 1;

  6. MUSCLE-fast, muscle -sv -maxiters 1 -diags1 -distance1 kbit20_3.

Similar articles

Cited by

References

    1. Katoh K., Misawa K., Kuma K., Miyata T. MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res. 2002;30:3059–3066. - PMC - PubMed
    1. Grasso C., Lee C. Combining partial order alignment and progressive multiple sequence alignment increases alignment speed and scalability to very large alignment problems. Bioinformatics. 2004;20:1546–1556. - PubMed
    1. Bateman A., Coin L., Durbin R., Finn R.D., Hollich V., Griffiths-Jones S., Khanna A., Marshall M., Moxon S., Sonnhammer E.L., Studholme D.J., Yeats C., Eddy S.R. The pfam protein families database. Nucleic Acids Res. 2004;32:D138–D141. - PMC - PubMed
    1. Chandonia J.M., Hon G., Walker N.S., Lo Conte L., Koehl P., Levitt M., Brenner S.E. The ASTRAL compendium in 2004. Nucleic Acids Res. 2004;32:D189–D192. - PMC - PubMed
    1. Rawlings N.D., Tolle D.P., Barrett A.J. MEROPS: the peptidase database. Nucleic Acids Res. 2004;32:D160–D164. - PMC - PubMed

Publication types