Search and clustering orders of magnitude faster than BLAST
- PMID: 20709691
- DOI: 10.1093/bioinformatics/btq461
Search and clustering orders of magnitude faster than BLAST
Abstract
Motivation: Biological sequence data is accumulating rapidly, motivating the development of improved high-throughput methods for sequence classification.
Results: UBLAST and USEARCH are new algorithms enabling sensitive local and global search of large sequence databases at exceptionally high speeds. They are often orders of magnitude faster than BLAST in practical applications, though sensitivity to distant protein relationships is lower. UCLUST is a new clustering method that exploits USEARCH to assign sequences to clusters. UCLUST offers several advantages over the widely used program CD-HIT, including higher speed, lower memory use, improved sensitivity, clustering at lower identities and classification of much larger datasets.
Availability: Binaries are available at no charge for non-commercial use at http://www.drive5.com/usearch.
Similar articles
-
Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences.Bioinformatics. 2006 Jul 1;22(13):1658-9. doi: 10.1093/bioinformatics/btl158. Epub 2006 May 26. Bioinformatics. 2006. PMID: 16731699
-
kClust: fast and sensitive clustering of large protein sequence databases.BMC Bioinformatics. 2013 Aug 15;14:248. doi: 10.1186/1471-2105-14-248. BMC Bioinformatics. 2013. PMID: 23945046 Free PMC article.
-
MMseqs software suite for fast and deep clustering and searching of large protein sequence sets.Bioinformatics. 2016 May 1;32(9):1323-30. doi: 10.1093/bioinformatics/btw006. Epub 2016 Jan 6. Bioinformatics. 2016. PMID: 26743509
-
Efficient algorithms for accurate hierarchical clustering of huge datasets: tackling the entire protein space.Bioinformatics. 2008 Jul 1;24(13):i41-9. doi: 10.1093/bioinformatics/btn174. Bioinformatics. 2008. PMID: 18586742 Free PMC article.
-
Clustered sequence representation for fast homology search.J Comput Biol. 2007 Jun;14(5):594-614. doi: 10.1089/cmb.2007.R005. J Comput Biol. 2007. PMID: 17683263 Review.
Cited by
-
Chronic Exposure to Environmentally Relevant Concentrations of Tetracycline Perturbs Gut Homeostasis in Zebrafish.Environ Health (Wash). 2023 Sep 7;1(4):258-269. doi: 10.1021/envhealth.3c00072. eCollection 2023 Oct 20. Environ Health (Wash). 2023. PMID: 39474494 Free PMC article.
-
Dissecting the molecular diversity and commonality of bovine and human treponemes identifies key survival and adhesion mechanisms.PLoS Pathog. 2021 Mar 29;17(3):e1009464. doi: 10.1371/journal.ppat.1009464. eCollection 2021 Mar. PLoS Pathog. 2021. PMID: 33780514 Free PMC article.
-
The temperature sensitivity of soil: microbial biodiversity, growth, and carbon mineralization.ISME J. 2021 Sep;15(9):2738-2747. doi: 10.1038/s41396-021-00959-1. Epub 2021 Mar 29. ISME J. 2021. PMID: 33782569 Free PMC article.
-
Lactic Acid Bacteria in Durum Wheat Flour Are Endophytic Components of the Plant during Its Entire Life Cycle.Appl Environ Microbiol. 2015 Oct;81(19):6736-48. doi: 10.1128/AEM.01852-15. Epub 2015 Jul 17. Appl Environ Microbiol. 2015. PMID: 26187970 Free PMC article.
-
Comparison of Bacterial Community Composition of Primary and Persistent Endodontic Infections Using Pyrosequencing.J Endod. 2015 Aug;41(8):1226-33. doi: 10.1016/j.joen.2015.03.010. Epub 2015 Apr 21. J Endod. 2015. PMID: 25906920 Free PMC article.
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources
Medical
Research Materials