A poor man's BLASTX--high-throughput metagenomic protein database search using PAUDA

Daniel H Huson¹, Chao Xie

Affiliations

Affiliation

¹ Singapore Centre on Environmental Life Sciences Engineering, School of Biological Sciences, Nanyang Technological University, Singapore 637551, Center for Bioinformatics, University of Tübingen, 72076 Tübingen, Germany and Life Sciences Institute, National University of Singapore, Singapore 117456.

PMID: 23658416
PMCID: PMC3866550
DOI: 10.1093/bioinformatics/btt254

A poor man's BLASTX--high-throughput metagenomic protein database search using PAUDA

Daniel H Huson et al. Bioinformatics. 2014.

. 2014 Jan 1;30(1):38-9.

doi: 10.1093/bioinformatics/btt254. Epub 2013 May 7.

Authors

Daniel H Huson¹, Chao Xie

Affiliation

¹ Singapore Centre on Environmental Life Sciences Engineering, School of Biological Sciences, Nanyang Technological University, Singapore 637551, Center for Bioinformatics, University of Tübingen, 72076 Tübingen, Germany and Life Sciences Institute, National University of Singapore, Singapore 117456.

PMID: 23658416
PMCID: PMC3866550
DOI: 10.1093/bioinformatics/btt254

Abstract

Summary: In the context of metagenomics, we introduce a new approach to protein database search called PAUDA, which runs ~10,000 times faster than BLASTX, while achieving about one-third of the assignment rate of reads to KEGG orthology groups, and producing gene and taxon abundance profiles that are highly correlated to those obtained with BLASTX. PAUDA requires <80 CPU hours to analyze a dataset of 246 million Illumina DNA reads from permafrost soil for which a previous BLASTX analysis (on a subset of 176 million reads) reportedly required 800,000 CPU hours, leading to the same clustering of samples by functional profiles.

Availability: PAUDA is freely available from: http://ab.inf.uni-tuebingen.de/software/pauda. Also supplementary method details are available from this website.

PubMed Disclaimer

Figures

**Fig. 1.**
An overview of the PAUDA approach

**Fig. 2.**
KEGG comparison of PAUDA and BLASTX. Left: Each true KO group is represented by a dot with coordinates that correspond to the number of reads assigned to the KO group by BLASTX (on the x-axis) and PAUDA (on the y-axis). Right: To show the low abundance KO groups more clearly, here, we plot the same data on a logarithmic scale

See this image and copyright information in PMC

References

1. Altschul SF, et al. Basic local alignment search tool. J. Mol. Biol. 1990;215:403–410. - PubMed
1. Handelsman J, et al. Molecular biological access to the chemistry of unknown soil microbes: a new frontier for natural products. Chem. Biol. 1998;5:245–249. - PubMed
1. Huson DH, et al. Integrative analysis of environmental sequences using MEGAN4. Genome Res. 2011;21:1552–1560. - PMC - PubMed
1. Kanehisa M, Goto S. KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 2000;28:27–30. - PMC - PubMed
1. Langmead B, Salzberg S. Fast gapped-read alignment with Bowtie2. Nat. Methods. 2012;9:357–359. - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

A poor man's BLASTX--high-throughput metagenomic protein database search using PAUDA

Affiliation

A poor man's BLASTX--high-throughput metagenomic protein database search using PAUDA

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Other Literature Sources