A review of machine learning methods for cancer characterization from microbiome data
- PMID: 38816569
- PMCID: PMC11139966
- DOI: 10.1038/s41698-024-00617-7
A review of machine learning methods for cancer characterization from microbiome data
Abstract
Recent studies have shown that the microbiome can impact cancer development, progression, and response to therapies suggesting microbiome-based approaches for cancer characterization. As cancer-related signatures are complex and implicate many taxa, their discovery often requires Machine Learning approaches. This review discusses Machine Learning methods for cancer characterization from microbiome data. It focuses on the implications of choices undertaken during sample collection, feature selection and pre-processing. It also discusses ML model selection, guiding how to choose an ML model, and model validation. Finally, it enumerates current limitations and how these may be surpassed. Proposed methods, often based on Random Forests, show promising results, however insufficient for widespread clinical usage. Studies often report conflicting results mainly due to ML models with poor generalizability. We expect that evaluating models with expanded, hold-out datasets, removing technical artifacts, exploring representations of the microbiome other than taxonomical profiles, leveraging advances in deep learning, and developing ML models better adapted to the characteristics of microbiome data will improve the performance and generalizability of models and enable their usage in the clinic.
© 2024. The Author(s).
Conflict of interest statement
R.M.F. and C.F. own patent WO/2018/169423 on microbiome markers for gastric cancer. The remaining authors declare no competing interests.
Figures




Similar articles
-
Leveraging Scheme for Cross-Study Microbiome Machine Learning Prediction and Feature Evaluations.Bioengineering (Basel). 2023 Feb 8;10(2):231. doi: 10.3390/bioengineering10020231. Bioengineering (Basel). 2023. PMID: 36829725 Free PMC article.
-
Gene-based microbiome representation enhances host phenotype classification.mSystems. 2023 Aug 31;8(4):e0053123. doi: 10.1128/msystems.00531-23. Epub 2023 Jul 5. mSystems. 2023. PMID: 37404032 Free PMC article.
-
Methodology for biomarker discovery with reproducibility in microbiome data using machine learning.BMC Bioinformatics. 2024 Jan 15;25(1):26. doi: 10.1186/s12859-024-05639-3. BMC Bioinformatics. 2024. PMID: 38225565 Free PMC article.
-
Applications of Machine Learning in Human Microbiome Studies: A Review on Feature Selection, Biomarker Identification, Disease Prediction and Treatment.Front Microbiol. 2021 Feb 19;12:634511. doi: 10.3389/fmicb.2021.634511. eCollection 2021. Front Microbiol. 2021. PMID: 33737920 Free PMC article. Review.
-
A toolbox of machine learning software to support microbiome analysis.Front Microbiol. 2023 Nov 22;14:1250806. doi: 10.3389/fmicb.2023.1250806. eCollection 2023. Front Microbiol. 2023. PMID: 38075858 Free PMC article. Review.
Cited by
-
Characterization of microbiota signatures in Iberian pig strains using machine learning algorithms.Anim Microbiome. 2025 Feb 3;7(1):13. doi: 10.1186/s42523-025-00378-z. Anim Microbiome. 2025. PMID: 39901297 Free PMC article.
-
Using New Technologies to Analyze Gut Microbiota and Predict Cancer Risk.Cells. 2024 Dec 1;13(23):1987. doi: 10.3390/cells13231987. Cells. 2024. PMID: 39682735 Free PMC article. Review.
-
A workflow for statistical analysis and visualization of microbiome omics data using the R microeco package.Nat Protoc. 2025 Aug 6. doi: 10.1038/s41596-025-01239-4. Online ahead of print. Nat Protoc. 2025. PMID: 40770112 Review.
-
Gut microbiota and their influence in brain cancer milieu.J Neuroinflammation. 2025 May 1;22(1):129. doi: 10.1186/s12974-025-03434-2. J Neuroinflammation. 2025. PMID: 40312370 Free PMC article. Review.
-
From laboratory to clinic: opportunities and challenges of functional food active ingredients in cancer therapy.Front Nutr. 2025 Jul 30;12:1627949. doi: 10.3389/fnut.2025.1627949. eCollection 2025. Front Nutr. 2025. PMID: 40808836 Free PMC article. Review.
References
-
- WHO. WHO Methods and Data Sources for Country-Level Causes of Death: 2000-2019 (World Health Organization, 2020).
-
- Hanahan D. Hallmarks of cancer: new dimensions. Cancer Discov. 2022;12:31–46. doi: 10.1158/2159-8290.CD-21-1059. - DOI - PubMed
Publication types
LinkOut - more resources
Full Text Sources
Research Materials