Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Oct 8;17(1):80.
doi: 10.1186/s13099-025-00750-z.

Gut microbiome-based machine learning model for early colorectal cancer and adenoma screening

Affiliations

Gut microbiome-based machine learning model for early colorectal cancer and adenoma screening

Yi-Jian Tsai et al. Gut Pathog. .

Abstract

Colorectal cancer (CRC) is a major source of cancer-related deaths, but early detection at the adenoma stage markedly improves outcomes. Existing tools such as colonoscopy and fecal immunochemical testing (FIT) are invasive or insensitive to early lesions. To develop a non-invasive screening strategy, we analyzed five publicly available 16 S rRNA sequencing datasets from North American and East Asia. Using Analysis of Compositions of Microbiome with Bias Correction (ANCOM-BC) and chi-square testing, we identified 109 discriminatory microbial taxa and trained random forest (RF) classification models to distinguish healthy controls, adenomas, and CRC. The models performed well in internal validation (AUC = 0.90, 95% CI: 0.869-0.931) and external validation (AUC = 0.82), indicating cross-population generalizability. We further developed a microbial risk score (MRS), inspired by polygenic risk score (PRS), methodology, which was significantly elevated in CRC across cohorts. Enrichment of CRC-associated pathogens such as Fusobacterium nucleatum and Porphyromonas gingivalis supports the biological relevance of the findings. These results demonstrate the potential of gut microbiome signatures combined with machine learning as scalable, non-invasive approach for early CRC and adenomas detection.

Keywords: Adenoma; Colorectal cancer; Gut microbiome; Machine learning; Microbial risk score; Non-invasive screening; Random forest.

PubMed Disclaimer

Conflict of interest statement

Declarations. Ethics approval and consent to participate: This study re-analyzed publicly available, de-identified 16 S rRNA sequencing datasets (accession numbers SRP062005, PRJNA534511, SRP133809, etc.) with associated clinical diagnosis. All original studies had institutional ethics approval and obtained informed consent from participants; no new human or animal experiments were performed. Consent for publication: Not applicable. Competing interests: The authors declare no competing interests.

References

    1. Hossain MS, Karuniawati H, Jairoun AA, Urbi Z, Ooi J, John A, et al. Colorectal cancer: a review of carcinogenesis, global epidemiology, current challenges, risk factors, preventive and treatment strategies. Cancers (Basel). 2022, 14(7): 1732. https://doi.org/10.3390/cancers14071732
    1. Hsieh MH, Kung PT, Kuo WY, Ke TW, Tsai WC. Recurrence, death risk, and related factors in patients with stage 0 colorectal cancer: a nationwide population-based study. Medicine (Baltimore). 2020;99(36):e21688. - PubMed - DOI
    1. Li Q, Geng S, Luo H, Wang W, Mo YQ, Luo Q, et al. Signaling pathways involved in colorectal cancer: pathogenesis and targeted therapy. Signal Transduct Target Ther. 2024;9(1):266. - PubMed - PMC - DOI
    1. Aleissa M, Drelichman ER, Mittal VK, Bhullar JS. Barriers in early detection of colorectal cancer and exploring potential solutions. World J Clin Oncol. 2024;15(7):811–7. - PubMed - PMC - DOI
    1. Mo S, Dai W, Wang H, Lan X, Ma C, Su Z, et al. Early detection and prognosis prediction for colorectal cancer by circulating tumour DNA methylation haplotypes: a multicentre cohort study. EClinicalMedicine. 2023;55:101717. - PubMed - DOI