. 2010 Mar 18:11:142.

doi: 10.1186/1471-2105-11-142.

MISHIMA--a new method for high speed multiple alignment of nucleotide sequences of bacterial genome scale data

Kirill Kryukov¹, Naruya Saitou

Affiliations

PMID: 20298584
PMCID: PMC2848238
DOI: 10.1186/1471-2105-11-142

MISHIMA--a new method for high speed multiple alignment of nucleotide sequences of bacterial genome scale data

Kirill Kryukov et al. BMC Bioinformatics. 2010.

. 2010 Mar 18:11:142.

doi: 10.1186/1471-2105-11-142.

Authors

Kirill Kryukov¹, Naruya Saitou

Affiliation

¹ Division of Population Genetics, National Institute of Genetics 1111 Yata, Mishima, 411-8540, Japan.

PMID: 20298584
PMCID: PMC2848238
DOI: 10.1186/1471-2105-11-142

Abstract

Background: Large nucleotide sequence datasets are becoming increasingly common objects of comparison. Complete bacterial genomes are reported almost everyday. This creates challenges for developing new multiple sequence alignment methods. Conventional multiple alignment methods are based on pairwise alignment and/or progressive alignment techniques. These approaches have performance problems when the number of sequences is large and when dealing with genome scale sequences.

Results: We present a new method of multiple sequence alignment, called MISHIMA (Method for Inferring Sequence History In terms of Multiple Alignment), that does not depend on pairwise sequence comparison. A new algorithm is used to quickly find rare oligonucleotide sequences shared by all sequences. Divide and conquer approach is then applied to break the sequences into fragments that can be aligned independently by an external alignment program. These partial alignments are assembled together to form a complete alignment of the original sequences.

Conclusions: MISHIMA provides improved performance compared to the commonly used multiple alignment methods. As an example, six complete genome sequences of bacteria species Helicobacter pylori (about 1.7 Mb each) were successfully aligned in about 6 hours using a single PC.

PubMed Disclaimer

Figures

**Figure 1**
**Procedure of MISHIMA alignment and complexity of each step**.

**Figure 2**
**Computation time of MISHIMA, CLUSTAL W, MUSCLE and MAFFT, on human mtDNA datasets of 50, 100, 200 and 400 sequences, complete and partial**. MUSCLE results are shown for all cases where it could complete the alignment.

**Figure 3**
**Alignment score of alignments produced by MISHIMA, CLUSTAL W, MUSCLE and MAFFT, on human mtDNA datasets of 50, 100, 200 and 400 sequences, complete and partial**. MUSCLE results are shown for all cases where it could complete the alignment.

**Figure 4**
**Computation time of MISHIMA, CLUSTAL W, MUSCLE and MAFFT, on mammalian mtDNA datasets of 50, 100 and 200 sequences, complete and partial**. MUSCLE results are shown for all cases where it could complete the alignment.

**Figure 5**
**Alignment score of alignments produced by MISHIMA, CLUSTAL W, MUSCLE and MAFFT, on mammalian mtDNA datasets of 50, 100 and 200 sequences, complete and partial**. MUSCLE results are shown for all cases where it could complete the alignment.

**Figure 6**
**Computation time of MISHIMA, CLUSTAL W, MAFFT and MLAGAN on *Helicobacter pylori* datasets of 6 sequences, complete and partial**.

**Figure 7**
**Alignment score of alignments produced by MISHIMA, CLUSTAL W, MAFFT and MLAGAN on *Helicobacter pylori* datasets of 6 sequences, complete and partial**.

**Figure 8**
**Computation time of MISHIMA and MLAGAN on *Staphylococcus aureus* datasets of 6, 10 and 14 sequences**.

**Figure 9**
**Alignment score of alignments produced by MISHIMA and MLAGAN on *Staphylococcus aureus* datasets of 6, 10 and 14 sequences**.

See this image and copyright information in PMC

References

1. Feng DF, Doolittle RF. Progressive sequence alignment as a prerequisite to correct phylogenetic trees. J Mol Evol. 1987;25(4):351–360. doi: 10.1007/BF02603120. - DOI - PubMed
1. Barton GJ, Sternberg MJE. A Strategy for the Rapid Multiple Alignment of Protein Sequences. Confidence Levels from Tetriary Structure Compositions. J Mol Biol. 1987;198:327–337. doi: 10.1016/0022-2836(87)90316-0. - DOI - PubMed
1. Taylor WR. A Flexible Method to Align Large Number of Biological Sequences. J Mol Evol. 1988;28(1-2):161–169. doi: 10.1007/BF02143508. - DOI - PubMed
1. Corpet F. Multiple sequence alignment with hierarchical clustering. Nucleic Acids Research. 1988;16(22):10881–10890. doi: 10.1093/nar/16.22.10881. - DOI - PMC - PubMed
1. Higgins DG, Sharp PM. CLUSTAL: a package for performing multiple sequence alignment on a microcomputer. Gene. 1988;73:237–244. doi: 10.1016/0378-1119(88)90330-7. - DOI - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Molecular Biology Databases
- NIAID Data Ecosystem - Find datasets on Infectious and Immune-mediated Diseases
Research Materials
- NCI CPTC Antibody Characterization Program
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

MISHIMA--a new method for high speed multiple alignment of nucleotide sequences of bacterial genome scale data

Affiliation

MISHIMA--a new method for high speed multiple alignment of nucleotide sequences of bacterial genome scale data

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Molecular Biology Databases

Research Materials

Miscellaneous