DMclust, a Density-based Modularity Method for Accurate OTU Picking of 16S rRNA Sequences
- PMID: 28586119
- DOI: 10.1002/minf.201600059
DMclust, a Density-based Modularity Method for Accurate OTU Picking of 16S rRNA Sequences
Abstract
Clustering 16S rRNA sequences into operational taxonomic units (OTUs) is a crucial step in analyzing metagenomic data. Although many methods have been developed, how to obtain an appropriate balance between clustering accuracy and computational efficiency is still a major challenge. A novel density-based modularity clustering method, called DMclust, is proposed in this paper to bin 16S rRNA sequences into OTUs with high clustering accuracy. The DMclust algorithm consists of four main phases. It first searches for the sequence dense group defined as n-sequence community, in which the distance between any two sequences is less than a threshold. Then these dense groups are used to construct a weighted network, where dense groups are viewed as nodes, each pair of dense groups is connected by an edge, and the distance of pairwise groups represents the weight of the edge. Then, a modularity-based community detection method is employed to generate the preclusters. Finally, the remaining sequences are assigned to their nearest preclusters to form OTUs. Compared with existing widely used methods, the experimental results on several metagenomic datasets show that DMclust has higher accurate clustering performance with acceptable memory usage.
Keywords: 16S rRNA; OTUs; clustering; metagenomic; modularity.
© 2017 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.
Similar articles
-
MtHc: a motif-based hierarchical method for clustering massive 16S rRNA sequences into OTUs.Mol Biosyst. 2015 Jul;11(7):1907-13. doi: 10.1039/c5mb00089k. Mol Biosyst. 2015. PMID: 25912934
-
M-pick, a modularity-based method for OTU picking of 16S rRNA sequences.BMC Bioinformatics. 2013 Feb 7;14:43. doi: 10.1186/1471-2105-14-43. BMC Bioinformatics. 2013. PMID: 23387433 Free PMC article.
-
DBH: A de Bruijn graph-based heuristic method for clustering large-scale 16S rRNA sequences into OTUs.J Theor Biol. 2017 Jul 21;425:80-87. doi: 10.1016/j.jtbi.2017.04.019. Epub 2017 Apr 26. J Theor Biol. 2017. PMID: 28454900
-
hc-OTU: A Fast and Accurate Method for Clustering Operational Taxonomic Units Based on Homopolymer Compaction.IEEE/ACM Trans Comput Biol Bioinform. 2018 Mar-Apr;15(2):441-451. doi: 10.1109/TCBB.2016.2535326. Epub 2016 Feb 26. IEEE/ACM Trans Comput Biol Bioinform. 2018. PMID: 26930691
-
A De Novo Robust Clustering Approach for Amplicon-Based Sequence Data.J Comput Biol. 2019 Jun;26(6):618-624. doi: 10.1089/cmb.2018.0170. Epub 2018 Dec 5. J Comput Biol. 2019. PMID: 30517025
Cited by
-
De novo clustering of long reads by gene from transcriptomics data.Nucleic Acids Res. 2019 Jan 10;47(1):e2. doi: 10.1093/nar/gky834. Nucleic Acids Res. 2019. PMID: 30260405 Free PMC article.
-
Comparison of Methods for Picking the Operational Taxonomic Units From Amplicon Sequences.Front Microbiol. 2021 Mar 24;12:644012. doi: 10.3389/fmicb.2021.644012. eCollection 2021. Front Microbiol. 2021. PMID: 33841367 Free PMC article.
-
NPBSS: a new PacBio sequencing simulator for generating the continuous long reads with an empirical model.BMC Bioinformatics. 2018 May 22;19(1):177. doi: 10.1186/s12859-018-2208-0. BMC Bioinformatics. 2018. PMID: 29788930 Free PMC article.
-
smsMap: mapping single molecule sequencing reads by locating the alignment starting positions.BMC Bioinformatics. 2020 Aug 4;21(1):341. doi: 10.1186/s12859-020-03698-w. BMC Bioinformatics. 2020. PMID: 32753028 Free PMC article.
-
A toolbox of machine learning software to support microbiome analysis.Front Microbiol. 2023 Nov 22;14:1250806. doi: 10.3389/fmicb.2023.1250806. eCollection 2023. Front Microbiol. 2023. PMID: 38075858 Free PMC article. Review.
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources