Review

. 2021 Apr 28:19:2742-2749.

doi: 10.1016/j.csbj.2021.04.054. eCollection 2021.

Towards multi-label classification: Next step of machine learning for microbiome research

Shunyao Wu¹, Yuzhu Chen¹, Zhiruo Li², Jian Li¹, Fengyang Zhao¹, Xiaoquan Su¹

Affiliations

¹ College of Computer Science and Technology, Qingdao University, Qingdao, Shandong 266071, China.
² School of Mathematics and Statistics, Qingdao University, Qingdao, Shandong 266071, China.

PMID: 34093989
PMCID: PMC8131981
DOI: 10.1016/j.csbj.2021.04.054

Review

Towards multi-label classification: Next step of machine learning for microbiome research

Shunyao Wu et al. Comput Struct Biotechnol J. 2021.

. 2021 Apr 28:19:2742-2749.

doi: 10.1016/j.csbj.2021.04.054. eCollection 2021.

Authors

Shunyao Wu¹, Yuzhu Chen¹, Zhiruo Li², Jian Li¹, Fengyang Zhao¹, Xiaoquan Su¹

Affiliations

¹ College of Computer Science and Technology, Qingdao University, Qingdao, Shandong 266071, China.
² School of Mathematics and Statistics, Qingdao University, Qingdao, Shandong 266071, China.

PMID: 34093989
PMCID: PMC8131981
DOI: 10.1016/j.csbj.2021.04.054

Abstract

Machine learning (ML) has been widely used in microbiome research for biomarker selection and disease prediction. By training microbial profiles of samples from patients and healthy controls, ML classifiers constructs data models by community features that highly correlated with the target diseases, so as to determine the status of new samples. To clearly understand the host-microbe interaction of specific diseases, previous studies always focused on well-designed cohorts, in which each sample was exactly labeled by a single status type. However, in fact an individual may be associated with multiple diseases simultaneously, which introduce additional variations on microbial patterns that interferes the status detection. More importantly, comorbidities or complications can be missed by regular ML models, limiting the practical application of microbiome techniques. In this review, we summarize the typical ML approaches of single-label classification for microbiome research, and demonstrate their limitations in multi-label disease detection using a real dataset. Then we prospect a further step of ML towards multi-label classification that potentially solves the aforementioned problem, including a series of promising strategies and key technical issues for applying multi-label classification in microbiome-based studies.

Keywords: Machine learning; Microbiome; Multi-label classification; Single-label classification.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Figures

**Fig. 1**
Comparison of single-label classification and multi-label classification. a. Single-label classification requires a sample has one label (status). b. Multi-label classification can detect more than one status for each sample.

**Fig. 2**
Microbial biomarkers of autoimmune selected from SD and MD by distribution-free independence test.

**Fig. 3**
Decision tree of GBDT binary classifier constructed from SD (A) was less complicated than that from MD (B). In each tree internal nodes represent taxa on genus-level, leaf nodes represent labels, and branch weights represent criteria for decision.

**Fig. 4**
Three key technical issues in multi-label classification. a. Too many labels in training data leads to unexpected high computational cost. b. Missed label reduces the detection sensitivity. c. Ambiguous label introduces false positive results.

See this image and copyright information in PMC

Cited by

Parallel-Meta Suite: Interactive and rapid microbiome data analysis on multiple platforms.
Chen Y, Li J, Zhang Y, Zhang M, Sun Z, Jing G, Huang S, Su X. Chen Y, et al. Imeta. 2022 Mar 6;1(1):e1. doi: 10.1002/imt2.1. eCollection 2022 Mar. Imeta. 2022. PMID: 38867729 Free PMC article.
An Improved Diagnostic of the Mycobacterium tuberculosis Drug Resistance Status by Applying a Decision Tree to Probabilities Assigned by the CatBoost Multiclassifier of Matrix Metalloproteinases Biomarkers.
Lavrova AI, Postnikov EB. Lavrova AI, et al. Diagnostics (Basel). 2022 Nov 17;12(11):2847. doi: 10.3390/diagnostics12112847. Diagnostics (Basel). 2022. PMID: 36428907 Free PMC article.
PhyloMix: enhancing microbiome-trait association prediction through phylogeny-mixing augmentation.
Jiang Y, Liao D, Zhu Q, Lu YY. Jiang Y, et al. Bioinformatics. 2025 Feb 4;41(2):btaf014. doi: 10.1093/bioinformatics/btaf014. Bioinformatics. 2025. PMID: 39799515 Free PMC article.
Comprehensive Assessment of 16S rRNA Gene Amplicon Sequencing for Microbiome Profiling across Multiple Habitats.
Zhang W, Fan X, Shi H, Li J, Zhang M, Zhao J, Su X. Zhang W, et al. Microbiol Spectr. 2023 Jun 15;11(3):e0056323. doi: 10.1128/spectrum.00563-23. Epub 2023 Apr 27. Microbiol Spectr. 2023. PMID: 37102867 Free PMC article.
Advanced computational tools, artificial intelligence and machine-learning approaches in gut microbiota and biomarker identification.
Dakal TC, Xu C, Kumar A. Dakal TC, et al. Front Med Technol. 2025 Apr 15;6:1434799. doi: 10.3389/fmedt.2024.1434799. eCollection 2024. Front Med Technol. 2025. PMID: 40303946 Free PMC article. Review.

See all "Cited by" articles

References

1. Knight R. Best practices for analysing microbiomes. Nat Rev Microbiol. 2018;16(7):410–422. - PubMed
1. LaPierre N. MetaPheno: a critical evaluation of deep learning and machine learning in metagenome-based disease prediction. Methods. 2019;166:74–82. - PMC - PubMed
1. Su X. Method development for cross-study microbiome data mining: challenges and opportunities. Computational and Structural. Biotechnol J. 2020 - PMC - PubMed
1. Edgar R.C. Search and clustering orders of magnitude faster than BLAST. Bioinformatics. 2010;26(19):2460–2461. - PubMed
1. Edgar R.C. UPARSE: highly accurate OTU sequences from microbial amplicon reads. Nat Methods. 2013;10(10):996–998. - PubMed

Publication types

Actions

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Towards multi-label classification: Next step of machine learning for microbiome research

Affiliations

Towards multi-label classification: Next step of machine learning for microbiome research

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

LinkOut - more resources

Full Text Sources