MtHc: a motif-based hierarchical method for clustering massive 16S rRNA sequences into OTUs
- PMID: 25912934
- DOI: 10.1039/c5mb00089k
MtHc: a motif-based hierarchical method for clustering massive 16S rRNA sequences into OTUs
Abstract
The recent sequencing revolution driven by high-throughput technologies has led to rapid accumulation of 16S rRNA sequences for microbial communities. Clustering short sequences into operational taxonomic units (OTUs) is an initial crucial process in analyzing metagenomic data. Although many methods have been proposed for OTU inferences, a major challenge is the balance between inference accuracy and computational efficiency. To address these challenges, we present a novel motif-based hierarchical method (namely MtHc) for clustering massive 16S rRNA sequences into OTUs with high clustering accuracy and low memory usage. Suppose all the 16S rRNA sequences can be used to construct a complete weighted network, where sequences are viewed as nodes, each pair of sequences is connected by an imaginary edge, and the distance of a pair of sequences represents the weight of the edge. MtHc consists of three main phrases. First, heuristically search the motif that is defined as n-node sub-graph (in the present study, n = 3, 4, 5), in which the distance between any two nodes is less than a threshold. Second, use the motif as a seed to form candidate clusters by computing the distances of other sequences with the motif. Finally, hierarchically merge the candidate clusters to generate the OTUs by only calculating the distances of motifs between two clusters. Compared with the existing methods on several simulated and real-life metagenomic datasets, we demonstrate that MtHc has higher clustering performance, less memory usage and robustness for setting parameters, and that it is more effective to handle the large-scale metagenomic datasets. The MtHC software can be freely download from for academic users.
Similar articles
-
DBH: A de Bruijn graph-based heuristic method for clustering large-scale 16S rRNA sequences into OTUs.J Theor Biol. 2017 Jul 21;425:80-87. doi: 10.1016/j.jtbi.2017.04.019. Epub 2017 Apr 26. J Theor Biol. 2017. PMID: 28454900
-
DMclust, a Density-based Modularity Method for Accurate OTU Picking of 16S rRNA Sequences.Mol Inform. 2017 Dec;36(12). doi: 10.1002/minf.201600059. Epub 2017 Jun 6. Mol Inform. 2017. PMID: 28586119
-
M-pick, a modularity-based method for OTU picking of 16S rRNA sequences.BMC Bioinformatics. 2013 Feb 7;14:43. doi: 10.1186/1471-2105-14-43. BMC Bioinformatics. 2013. PMID: 23387433 Free PMC article.
-
MSClust: A Multi-Seeds based Clustering algorithm for microbiome profiling using 16S rRNA sequence.J Microbiol Methods. 2013 Sep;94(3):347-55. doi: 10.1016/j.mimet.2013.07.004. Epub 2013 Jul 28. J Microbiol Methods. 2013. PMID: 23899776 Free PMC article.
-
Improved OTU-picking using long-read 16S rRNA gene amplicon sequencing and generic hierarchical clustering.Microbiome. 2015 Oct 5;3:43. doi: 10.1186/s40168-015-0105-6. Microbiome. 2015. PMID: 26434730 Free PMC article.
Cited by
-
invMap: a sensitive mapping tool for long noisy reads with inversion structural variants.Bioinformatics. 2023 Dec 1;39(12):btad726. doi: 10.1093/bioinformatics/btad726. Bioinformatics. 2023. PMID: 38058196 Free PMC article.
-
Comparison of Methods for Picking the Operational Taxonomic Units From Amplicon Sequences.Front Microbiol. 2021 Mar 24;12:644012. doi: 10.3389/fmicb.2021.644012. eCollection 2021. Front Microbiol. 2021. PMID: 33841367 Free PMC article.
-
smsMap: mapping single molecule sequencing reads by locating the alignment starting positions.BMC Bioinformatics. 2020 Aug 4;21(1):341. doi: 10.1186/s12859-020-03698-w. BMC Bioinformatics. 2020. PMID: 32753028 Free PMC article.
-
Factoring the intestinal microbiome into the pathogenesis of autoimmune hepatitis.World J Gastroenterol. 2016 Nov 14;22(42):9257-9278. doi: 10.3748/wjg.v22.i42.9257. World J Gastroenterol. 2016. PMID: 27895415 Free PMC article. Review.
-
A toolbox of machine learning software to support microbiome analysis.Front Microbiol. 2023 Nov 22;14:1250806. doi: 10.3389/fmicb.2023.1250806. eCollection 2023. Front Microbiol. 2023. PMID: 38075858 Free PMC article. Review.
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources