Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Mar 18:11:142.
doi: 10.1186/1471-2105-11-142.

MISHIMA--a new method for high speed multiple alignment of nucleotide sequences of bacterial genome scale data

Affiliations

MISHIMA--a new method for high speed multiple alignment of nucleotide sequences of bacterial genome scale data

Kirill Kryukov et al. BMC Bioinformatics. .

Abstract

Background: Large nucleotide sequence datasets are becoming increasingly common objects of comparison. Complete bacterial genomes are reported almost everyday. This creates challenges for developing new multiple sequence alignment methods. Conventional multiple alignment methods are based on pairwise alignment and/or progressive alignment techniques. These approaches have performance problems when the number of sequences is large and when dealing with genome scale sequences.

Results: We present a new method of multiple sequence alignment, called MISHIMA (Method for Inferring Sequence History In terms of Multiple Alignment), that does not depend on pairwise sequence comparison. A new algorithm is used to quickly find rare oligonucleotide sequences shared by all sequences. Divide and conquer approach is then applied to break the sequences into fragments that can be aligned independently by an external alignment program. These partial alignments are assembled together to form a complete alignment of the original sequences.

Conclusions: MISHIMA provides improved performance compared to the commonly used multiple alignment methods. As an example, six complete genome sequences of bacteria species Helicobacter pylori (about 1.7 Mb each) were successfully aligned in about 6 hours using a single PC.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Procedure of MISHIMA alignment and complexity of each step.
Figure 2
Figure 2
Computation time of MISHIMA, CLUSTAL W, MUSCLE and MAFFT, on human mtDNA datasets of 50, 100, 200 and 400 sequences, complete and partial. MUSCLE results are shown for all cases where it could complete the alignment.
Figure 3
Figure 3
Alignment score of alignments produced by MISHIMA, CLUSTAL W, MUSCLE and MAFFT, on human mtDNA datasets of 50, 100, 200 and 400 sequences, complete and partial. MUSCLE results are shown for all cases where it could complete the alignment.
Figure 4
Figure 4
Computation time of MISHIMA, CLUSTAL W, MUSCLE and MAFFT, on mammalian mtDNA datasets of 50, 100 and 200 sequences, complete and partial. MUSCLE results are shown for all cases where it could complete the alignment.
Figure 5
Figure 5
Alignment score of alignments produced by MISHIMA, CLUSTAL W, MUSCLE and MAFFT, on mammalian mtDNA datasets of 50, 100 and 200 sequences, complete and partial. MUSCLE results are shown for all cases where it could complete the alignment.
Figure 6
Figure 6
Computation time of MISHIMA, CLUSTAL W, MAFFT and MLAGAN on Helicobacter pylori datasets of 6 sequences, complete and partial.
Figure 7
Figure 7
Alignment score of alignments produced by MISHIMA, CLUSTAL W, MAFFT and MLAGAN on Helicobacter pylori datasets of 6 sequences, complete and partial.
Figure 8
Figure 8
Computation time of MISHIMA and MLAGAN on Staphylococcus aureus datasets of 6, 10 and 14 sequences.
Figure 9
Figure 9
Alignment score of alignments produced by MISHIMA and MLAGAN on Staphylococcus aureus datasets of 6, 10 and 14 sequences.

References

    1. Feng DF, Doolittle RF. Progressive sequence alignment as a prerequisite to correct phylogenetic trees. J Mol Evol. 1987;25(4):351–360. doi: 10.1007/BF02603120. - DOI - PubMed
    1. Barton GJ, Sternberg MJE. A Strategy for the Rapid Multiple Alignment of Protein Sequences. Confidence Levels from Tetriary Structure Compositions. J Mol Biol. 1987;198:327–337. doi: 10.1016/0022-2836(87)90316-0. - DOI - PubMed
    1. Taylor WR. A Flexible Method to Align Large Number of Biological Sequences. J Mol Evol. 1988;28(1-2):161–169. doi: 10.1007/BF02143508. - DOI - PubMed
    1. Corpet F. Multiple sequence alignment with hierarchical clustering. Nucleic Acids Research. 1988;16(22):10881–10890. doi: 10.1093/nar/16.22.10881. - DOI - PMC - PubMed
    1. Higgins DG, Sharp PM. CLUSTAL: a package for performing multiple sequence alignment on a microcomputer. Gene. 1988;73:237–244. doi: 10.1016/0378-1119(88)90330-7. - DOI - PubMed

Publication types