Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012 Jul 15;28(14):1823-9.
doi: 10.1093/bioinformatics/bts252. Epub 2012 May 3.

SINA: accurate high-throughput multiple sequence alignment of ribosomal RNA genes

Affiliations

SINA: accurate high-throughput multiple sequence alignment of ribosomal RNA genes

Elmar Pruesse et al. Bioinformatics. .

Abstract

Motivation: In the analysis of homologous sequences, computation of multiple sequence alignments (MSAs) has become a bottleneck. This is especially troublesome for marker genes like the ribosomal RNA (rRNA) where already millions of sequences are publicly available and individual studies can easily produce hundreds of thousands of new sequences. Methods have been developed to cope with such numbers, but further improvements are needed to meet accuracy requirements.

Results: In this study, we present the SILVA Incremental Aligner (SINA) used to align the rRNA gene databases provided by the SILVA ribosomal RNA project. SINA uses a combination of k-mer searching and partial order alignment (POA) to maintain very high alignment accuracy while satisfying high throughput performance demands. SINA was evaluated in comparison with the commonly used high throughput MSA programs PyNAST and mothur. The three BRAliBase III benchmark MSAs could be reproduced with 99.3, 97.6 and 96.1 accuracy. A larger benchmark MSA comprising 38 772 sequences could be reproduced with 98.9 and 99.3% accuracy using reference MSAs comprising 1000 and 5000 sequences. SINA was able to achieve higher accuracy than PyNAST and mothur in all performed benchmarks.

Availability: Alignment of up to 500 sequences using the latest SILVA SSU/LSU Ref datasets as reference MSA is offered at http://www.arb-silva.de/aligner. This page also links to Linux binaries, user manual and tutorial. SINA is made available under a personal use license.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
The alignment of the selected reference sequences is converted from RC-MSA representation (top) to PO-MSA representation (bottom)
Fig. 2.
Fig. 2.
SINA alignment accuracy decreases almost linearly with the shared fractional identity of candidate and reference when using one reference sequence (red line). Using larger numbers of reference sequences markedly increases accuracy
Fig. 3.
Fig. 3.
An alternative implementation which used simple column-profiles built from the selected reference sequences showed overall lower accuracy. Increasing the number of reference sequences quickly led to a degradation in accuracy

References

    1. Altschul S.F., et al. Basic local alignment search tool. J. Mol. Biol. 1990;215:403–410. - PubMed
    1. Amaral-Zettler L., et al. Proceedings of the international workshop on Ribosomal RNA technology, April 7-9, 2008, Bremen, Germany. Syst. Appl. Microbio. 2008;31:258–268. - PubMed
    1. Caporaso J.G., et al. PyNAST: a flexible tool for aligning sequences to a template alignment. Bioinformatics. 2010;26:266–267. - PMC - PubMed
    1. Cole J.R., et al. The Ribosomal Database Project: improved alignments and new tools for rRNA analysis. Nucleic Acids Res. 2009;37:D141–D145. - PMC - PubMed
    1. DeSantis T.Z., et al. Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with ARB. Appl. Environ. Microbiol. 2006a;72:5069–5072. - PMC - PubMed

Publication types