CCPred: Global and population-specific colorectal cancer prediction and metagenomic biomarker identification at different molecular levels using machine learning techniques
- PMID: 39293338
- DOI: 10.1016/j.compbiomed.2024.109098
CCPred: Global and population-specific colorectal cancer prediction and metagenomic biomarker identification at different molecular levels using machine learning techniques
Abstract
Colorectal cancer (CRC) ranks as the third most common cancer globally and the second leading cause of cancer-related deaths. Recent research highlights the pivotal role of the gut microbiota in CRC development and progression. Understanding the complex interplay between disease development and metagenomic data is essential for CRC diagnosis and treatment. Current computational models employ machine learning to identify metagenomic biomarkers associated with CRC, yet there is a need to improve their accuracy through a holistic biological knowledge perspective. This study aims to evaluate CRC-associated metagenomic data at species, enzymes, and pathway levels via conducting global and population-specific analyses. These analyses utilize relative abundance values from human gut microbiome sequencing data and robust classification models are built for disease prediction and biomarker identification. For global CRC prediction and biomarker identification, the features that are identified by SelectKBest (SKB), Information Gain (IG), and Extreme Gradient Boosting (XGBoost) methods are combined. Population-based analysis includes within-population, leave-one-dataset-out (LODO) and cross-population approaches. Four classification algorithms are employed for CRC classification. Random Forest achieved an AUC of 0.83 for species data, 0.78 for enzyme data and 0.76 for pathway data globally. On the global scale, potential taxonomic biomarkers include ruthenibacterium lactatiformanas; enzyme biomarkers include RNA 2' 3' cyclic 3' phosphodiesterase; and pathway biomarkers include pyruvate fermentation to acetone pathway. This study underscores the potential of machine learning models trained on metagenomic data for improved disease prediction and biomarker discovery. The proposed model and associated files are available at https://github.com/TemizMus/CCPRED.
Keywords: Biomarkers; Colorectal cancer; Enzyme; Machine learning; Metagenomic; Microbiome; Pathway; Species.
Copyright © 2024 Elsevier Ltd. All rights reserved.
Conflict of interest statement
Declaration of competing interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Similar articles
-
Prediction of colorectal cancer based on taxonomic levels of microorganisms and discovery of taxonomic biomarkers using the Grouping-Scoring-Modeling (G-S-M) approach.Comput Biol Med. 2025 Mar;187:109813. doi: 10.1016/j.compbiomed.2025.109813. Epub 2025 Feb 9. Comput Biol Med. 2025. PMID: 39929003
-
Using gut microbiota as a diagnostic tool for colorectal cancer: machine learning techniques reveal promising results.J Med Microbiol. 2023 Jun;72(6). doi: 10.1099/jmm.0.001699. J Med Microbiol. 2023. PMID: 37288545
-
Machine Learning Meta-analysis of Large Metagenomic Datasets: Tools and Biological Insights.PLoS Comput Biol. 2016 Jul 11;12(7):e1004977. doi: 10.1371/journal.pcbi.1004977. eCollection 2016 Jul. PLoS Comput Biol. 2016. PMID: 27400279 Free PMC article.
-
Gut microbiome in colorectal cancer: metagenomics from bench to bedside.JNCI Cancer Spectr. 2025 Apr 30;9(3):pkaf026. doi: 10.1093/jncics/pkaf026. JNCI Cancer Spectr. 2025. PMID: 40045177 Free PMC article. Review.
-
Exploring the gut microbiome's role in colorectal cancer: diagnostic and prognostic implications.Front Immunol. 2024 Oct 17;15:1431747. doi: 10.3389/fimmu.2024.1431747. eCollection 2024. Front Immunol. 2024. PMID: 39483461 Free PMC article. Review.
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Medical