This is a preprint.
Assessing Genotype-Phenotype Correlations with Deep Learning in Colorectal Cancer: A Multi-Centric Study
- PMID: 39973981
- PMCID: PMC11838662
- DOI: 10.1101/2025.02.04.25321660
Assessing Genotype-Phenotype Correlations with Deep Learning in Colorectal Cancer: A Multi-Centric Study
Update in
-
Assessing genotype-phenotype correlations in colorectal cancer with deep learning: a multicentre cohort study.Lancet Digit Health. 2025 Aug;7(8):100891. doi: 10.1016/j.landig.2025.100891. Epub 2025 Aug 19. Lancet Digit Health. 2025. PMID: 40829965
Abstract
Background: Deep Learning (DL) has emerged as a powerful tool to predict genetic biomarkers directly from digitized Hematoxylin and Eosin (H&E) slides in colorectal cancer (CRC). However, few studies have systematically investigated the predictability of biomarkers beyond routinely available alterations such as microsatellite instability (MSI), and BRAF and KRAS mutations.
Methods: Our primary dataset comprised H&E slides of CRC tumors across five cohorts totaling 1,376 patients who underwent comprehensive panel sequencing, with an additional 536 patients from two public datasets for validation. We developed a DL model using a single transformer model to predict multiple genetic alterations directly from the slides. The model's performance was compared against conventional single-target models, and potential confounders were analyzed.
Findings: The multi-target model was able to predict numerous biomarkers from pathology slides, matching and partly exceeding single-target transformers. The Area Under the Receiver Operating Characteristic curve (AUROC, mean ± std) on the primary external validation cohorts was: BRAF (0·78 ± 0·01), hypermutation (0·88 ± 0·01), MSI (0·93 ± 0·01), RNF43 (0·86 ± 0·01); this biomarker predictability was mirrored across metrics and co-occurrence analyses. However, biomarkers with high AUROCs largely correlated with MSI, with model predictions depending considerably on MSI-associated morphology upon pathological examination.
Interpretation: Our study demonstrates that multi-target transformers can predict the biomarker status for numerous genetic alterations in CRC directly from H&E slides. However, their predictability is mainly associated with MSI phenotype, despite indications of slight biomarker-inherent contributions to a phenotype. Our findings underscore the need to analyze confounders in AI-based oncology biomarkers. To enable this, we developed a validated model applicable to other cancers and larger, diverse datasets.
Funding: The German Federal Ministry of Health, the Max-Eder-Programme of German Cancer Aid, the German Federal Ministry of Education and Research, the German Academic Exchange Service, and the EU.
Conflict of interest statement
Declaration of interest JNK declares consulting services for Bioptimus, France; Owkin, France; DoMore Diagnostics, Norway; Panakeia, UK; AstraZeneca, UK; Mindpeak, Germany; and MultiplexDx, Slovakia. Furthermore, he holds shares in StratifAI GmbH, Germany, Synagen GmbH, Germany; has received a research grant by GSK; and has received honoraria by AstraZeneca, Bayer, Daiichi Sankyo, Eisai, Janssen, Merck, MSD, BMS, Roche, Pfizer, and Fresenius. MG has received honoraria for lectures sponsored by Techniker Krankenkasse (TK) and AstraZeneca. SF has received honoraria for lectures by BMS and MSD. UP declares consulting services for AbbVie and her husband is holding individual stocks for the following companies: BioNTech SE – ADR, Amazon, CureVac BV, NanoString Technologies, Google/Alphabet Inc Class C, NVIDIA Corp, Microsoft Corp.. No other potential conflicts of interest are reported by any of the authors.
Figures
References
Publication types
Grants and funding
- 75N92021D00002/HL/NHLBI NIH HHS/United States
- P30 CA015704/CA/NCI NIH HHS/United States
- R01 CA107333/CA/NCI NIH HHS/United States
- S10 OD028685/OD/NIH HHS/United States
- U01 CA137088/CA/NCI NIH HHS/United States
- P20 CA252733/CA/NCI NIH HHS/United States
- 75N92021D00001/HL/NHLBI NIH HHS/United States
- 75N92021D00003/WH/WHI NIH HHS/United States
- 75N92021D00004/WH/WHI NIH HHS/United States
- HHSN268201700006C/HL/NHLBI NIH HHS/United States
- 75N92021D00005/WH/WHI NIH HHS/United States
- HHSN261201000032C/CA/NCI NIH HHS/United States
LinkOut - more resources
Full Text Sources
Research Materials
Miscellaneous