Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Oct 26;36(6):1928-1929.
doi: 10.1093/bioinformatics/btz795. Online ahead of print.

Kalign 3: multiple sequence alignment of large data sets

Affiliations

Kalign 3: multiple sequence alignment of large data sets

Timo Lassmann. Bioinformatics. .

Abstract

Motivation: Kalign is an efficient multiple sequence alignment (MSA) program capable of aligning thousands of protein or nucleotide sequences. However, current alignment problems involving large numbers of sequences are exceeding Kalign's original design specifications. Here we present a completely re-written and updated version to meet current and future alignment challenges.

Results: Kalign now uses a SIMD accelerated version of the bit-parallel Gene Myers algorithm to estimate pariwise distances, adopts a sequence embedding strategy and the bi-secting K-means algorithm to rapidly construct guide trees for thousands of sequences. The new version maintains high alignment accuracy on both protein and nucleotide alignments and scales better than other MSA tools.

Availability: The source code of Kalign and code to reproduce the results are found here: https://github.com/timolassmann/kalign.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Benchmark results. (a) Sum of pairs scores (SP) of all tested alignment programs on Balibase protein alignment datasets. (b) SP scores of RNA bralibase alignments. (c) Computational performance assessed on the HomFam dataset

References

    1. Blackshields G. et al. (2010) Sequence embedding for fast construction of guide trees for multiple sequence alignment. Algorithms Mol. Biol., 5, 21.. - PMC - PubMed
    1. Edgar R.C. (2004) Muscle: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res., 32, 1792–1797. - PMC - PubMed
    1. Gardner P. et al. (2005) A benchmark of multiple sequence alignment programs upon structural RNAs. Nucleic Acids Res., 33, 2433–2439. - PMC - PubMed
    1. Katoh K., Toh H. (2007) Parttree: an algorithm to build an approximate tree from a large number of unaligned sequences. Bioinformatics, 23, 372–374. - PubMed
    1. Lassmann T. et al. (2009) Kalign2: high-performance multiple alignment of protein and nucleotide sequences allowing external features. Nucleic Acids Res., 37, 858–865. - PMC - PubMed