NAST: a multiple sequence alignment server for comparative analysis of 16S rRNA genes

T Z DeSantis Jr¹, P Hugenholtz, K Keller, E L Brodie, N Larsen, Y M Piceno, R Phan, G L Andersen

Affiliations

PMID: 16845035
PMCID: PMC1538769
DOI: 10.1093/nar/gkl244

NAST: a multiple sequence alignment server for comparative analysis of 16S rRNA genes

T Z DeSantis Jr et al. Nucleic Acids Res. 2006.

. 2006 Jul 1;34(Web Server issue):W394-9.

doi: 10.1093/nar/gkl244.

Authors

T Z DeSantis Jr¹, P Hugenholtz, K Keller, E L Brodie, N Larsen, Y M Piceno, R Phan, G L Andersen

Affiliation

¹ Lawrence Berkeley National Laboratory, Center for Environmental Biotechnology, Berkeley, CA, USA.

PMID: 16845035
PMCID: PMC1538769
DOI: 10.1093/nar/gkl244

Abstract

Microbiologists conducting surveys of bacterial and archaeal diversity often require comparative alignments of thousands of 16S rRNA genes collected from a sample. The computational resources and bioinformatics expertise required to construct such an alignment has inhibited high-throughput analysis. It was hypothesized that an online tool could be developed to efficiently align thousands of 16S rRNA genes via the NAST (Nearest Alignment Space Termination) algorithm for creating multiple sequence alignments (MSA). The tool was implemented with a web-interface at http://greengenes.lbl.gov/NAST. Each user-submitted sequence is compared with Greengenes' 'Core Set', comprising approximately 10,000 aligned non-chimeric sequences representative of the currently recognized diversity among bacteria and archaea. User sequences are oriented and paired with their closest match in the Core Set to serve as a template for inserting gap characters. Non-16S data (sequence from vector or surrounding genomic regions) are conveniently removed in the returned alignment. From the resulting MSA, distance matrices can be calculated for diversity estimates and organisms can be classified by taxonomy. The ability to align and categorize large sequence sets using a simple interface has enabled researchers with various experience levels to obtain bacterial and archaeal community profiles.

PubMed Disclaimer

Figures

**Figure 1**
Locating a NAST alignment template for a user-supplied candidate sequence. Candidate sequence in green is matched to a near-neighbor aligned template in Greengenes' Core Set (grey) by tallying 7mers in common. The alignment ‘template’ is BLAST aligned to the candidate parameter q = −1 (favors long match). The candidate is then trimmed of flanking sequence data such as tRNA, intergenic spacer regions, vector sequence, 23S rDNA and sequence outside of the high-scoring pair (HSP) boundaries. If the HSP pairs opposite strands, then the candidate is reverse complemented.

**Figure 2**
Example of NAST compression of a BLAST pairwise alignment using a 38 character aligned template. Template and candidate is extended to 40 characters after (A) BLAST gap insertion and (B) retention of original template spacing. (C) Nucleotide insertions in the candidate relative to the template which force additional characters to be added in the template are identified at positions α and β. (D) A bi-directional search for the nearest alignment space (hyphen) relative to the insertion terminates at the positions indicated by the black arrows. The leftward search from the α position was shorter in distance compared with the rightward, thus the space to the left of ‘GT’ was removed. (E) The search from the β position encountered the alignment edge on the right, thus the position to the left of ‘AC’ was removed. (F) Lastly, the two template-extending spaces are deleted from the template. Notice that sequence data are not added to or overwritten in the candidate. The NAST removal of two characters from both sequences allowed local misalignments (underlined) while preserving the 38 character format of the global MSA.

**Figure 3**
Greengenes pre-processing and post-processing tools for use with the NAST aligner. ‘Trim’ can be used to remove poor quality DNA data before alignment. ‘Classify’ and ‘Distance’ receive NAST MSAs as input. ‘Export’ and ‘Download’ allow advanced users to append their MSA with select sequences from the public repositories.

See this image and copyright information in PMC

References

1. Fox G.E., Stackebrandt E., Hespell R.B., Gibson J., Maniloff J., Dyer T.A., Wolfe R.S., Balch W.E., Tanner R.S., Magrum L.J., et al. The phylogeny of prokaryotes. Science. 1980;209:457–463. - PubMed
1. Woese C.R., Fox G.E., Zablen L., Uchida T., Bonen L., Pechman K., Lewis B.J., Stahl D. Conservation of primary structure in 16S ribosomal RNA. Nature. 1975;254:83–86. - PubMed
1. Kong Y., Ong S.L., Ng W.J., Liu W.T. Diversity and distribution of a deeply branched novel proteobacterial group found in anaerobic–aerobic activated sludge processes. Environ. Microbiol. 2002;4:753–757. - PubMed
1. Hughes J.B., Hellmann J.J., Ricketts T.H., Bohannan B.J. Counting the uncountable: statistical approaches to estimating microbial diversity. Appl. Environ. Microbiol. 2001;67:4399–4406. - PMC - PubMed
1. Eckburg P.B., Bik E.M., Bernstein C.N., Purdom E., Dethlefsen L., Sargent M., Gill S.R., Nelson K.E., Relman D.A. Diversity of the human intestinal microbial flora. Science. 2005;308:1635–1638. - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

NAST: a multiple sequence alignment server for comparative analysis of 16S rRNA genes

Affiliation

NAST: a multiple sequence alignment server for comparative analysis of 16S rRNA genes

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Other Literature Sources

Miscellaneous