DBH: A de Bruijn graph-based heuristic method for clustering large-scale 16S rRNA sequences into OTUs
- PMID: 28454900
- DOI: 10.1016/j.jtbi.2017.04.019
DBH: A de Bruijn graph-based heuristic method for clustering large-scale 16S rRNA sequences into OTUs
Abstract
Recent sequencing revolution driven by high-throughput technologies has led to rapid accumulation of 16S rRNA sequences for microbial communities. Clustering short sequences into operational taxonomic units (OTUs) is an initial crucial process in analyzing metagenomic data. Although many heuristic methods have been proposed for OTU inferences with low computational complexity, they just select one sequence as the seed for each cluster and the results are sensitive to the selected sequences that represent the clusters. To address this issue, we present a de Bruijn graph-based heuristic clustering method (DBH) for clustering massive 16S rRNA sequences into OTUs by introducing a novel seed selection strategy and greedy clustering approach. Compared with existing widely used methods on several simulated and real-life metagenomic datasets, the results show that DBH has higher clustering performance and low memory usage, facilitating the overestimation of OTUs number. DBH is more effective to handle large-scale metagenomic datasets. The DBH software can be freely downloaded from https://github.com/nwpu134/DBH.git for academic users.
Keywords: 16S rRNA; Clustering; Metagenomic; Operational taxonomic units; de Bruijn graph.
Copyright © 2017 Elsevier Ltd. All rights reserved.
Similar articles
-
DMSC: A Dynamic Multi-Seeds Method for Clustering 16S rRNA Sequences Into OTUs.Front Microbiol. 2019 Mar 12;10:428. doi: 10.3389/fmicb.2019.00428. eCollection 2019. Front Microbiol. 2019. PMID: 30915052 Free PMC article.
-
MtHc: a motif-based hierarchical method for clustering massive 16S rRNA sequences into OTUs.Mol Biosyst. 2015 Jul;11(7):1907-13. doi: 10.1039/c5mb00089k. Mol Biosyst. 2015. PMID: 25912934
-
DMclust, a Density-based Modularity Method for Accurate OTU Picking of 16S rRNA Sequences.Mol Inform. 2017 Dec;36(12). doi: 10.1002/minf.201600059. Epub 2017 Jun 6. Mol Inform. 2017. PMID: 28586119
-
Improved OTU-picking using long-read 16S rRNA gene amplicon sequencing and generic hierarchical clustering.Microbiome. 2015 Oct 5;3:43. doi: 10.1186/s40168-015-0105-6. Microbiome. 2015. PMID: 26434730 Free PMC article.
-
Computational methods for the analysis of tag sequences in metagenomics studies.Front Biosci (Schol Ed). 2012 Jun 1;4(4):1333-43. doi: 10.2741/s335. Front Biosci (Schol Ed). 2012. PMID: 22652875 Review.
Cited by
-
smsMap: mapping single molecule sequencing reads by locating the alignment starting positions.BMC Bioinformatics. 2020 Aug 4;21(1):341. doi: 10.1186/s12859-020-03698-w. BMC Bioinformatics. 2020. PMID: 32753028 Free PMC article.
-
pathMap: a path-based mapping tool for long noisy reads with high sensitivity.Brief Bioinform. 2024 Jan 22;25(2):bbae107. doi: 10.1093/bib/bbae107. Brief Bioinform. 2024. PMID: 38517696 Free PMC article.
-
DMSC: A Dynamic Multi-Seeds Method for Clustering 16S rRNA Sequences Into OTUs.Front Microbiol. 2019 Mar 12;10:428. doi: 10.3389/fmicb.2019.00428. eCollection 2019. Front Microbiol. 2019. PMID: 30915052 Free PMC article.
-
Comparison of Methods for Picking the Operational Taxonomic Units From Amplicon Sequences.Front Microbiol. 2021 Mar 24;12:644012. doi: 10.3389/fmicb.2021.644012. eCollection 2021. Front Microbiol. 2021. PMID: 33841367 Free PMC article.
-
Metagenomic data of bacterial community from different land uses at the river basin, Kelantan.Data Brief. 2020 Sep 28;33:106351. doi: 10.1016/j.dib.2020.106351. eCollection 2020 Dec. Data Brief. 2020. PMID: 33072827 Free PMC article.
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources
Miscellaneous