This is a preprint.

It has not yet been peer reviewed by a journal.

The National Library of Medicine is running a pilot to include preprints that result from research funded by NIH in PMC and PubMed.

[Preprint]. 2025 Feb 8:2025.02.04.25321660.

doi: 10.1101/2025.02.04.25321660.

Assessing Genotype-Phenotype Correlations with Deep Learning in Colorectal Cancer: A Multi-Centric Study

Marco Gustav¹, Marko van Treeck¹, Nic G Reitsam², Zunamys I Carrero¹, Chiara M L Loeffler^{1

3}, Asier Rabasco Meneghetti¹, Bruno Märkl², Lisa A Boardman⁴, Amy J French⁵, Ellen L Goode⁶, Andrea Gsur⁷, Stefanie Brezina⁷, Marc J Gunter^{8

9}, Neil Murphy⁸, Pia Hönscheid^{10

11

12}, Christian Sperling¹⁰, Sebastian Foersch¹³, Robert Steinfelder¹⁴, Tabitha Harrison^{14

15}, Ulrike Peters^{14

15}, Amanda Phipps^{14

15}, Jakob Nikolas Kather^{1

3

16

17}

Affiliations

¹ Else Kroener Fresenius Center for Digital Health, Faculty of Medicine and University Hospital Carl Gustav Carus, TUD Dresden University of Technology, 01307 Dresden, Germany.
² Pathology, Faculty of Medicine, University of Augsburg, Augsburg, Germany.
³ Department of Medicine I, Faculty of Medicine and University Hospital Carl Gustav Carus, TUD Dresden University of Technology, 01307 Dresden, Germany.
⁴ Division of Gastroenterology and Hepatology, Mayo Clinic, Rochester, Minnesota, USA.
⁵ Division of Laboratory Genetics, Department of Laboratory Medicine and Pathology, Mayo Clinic, Rochester, Minnesota, USA.
⁶ Department of Quantitative Health Sciences, Division of Epidemiology, Mayo Clinic, Rochester, Minnesota, USA.
⁷ Center for Cancer Research, Medical University of Vienna, Vienna, Austria.
⁸ Nutrition and Metabolism Branch, International Agency for Research on Cancer, World Health Organization, Lyon, France.
⁹ Cancer Epidemiology and Prevention Research Unit, School of Public Health, Imperial College London, London, United Kingdom.
¹⁰ Institute of Pathology, University Hospital Carl Gustav Carus (UKD), Technical University Dresden (TUD), Dresden, Germany.
¹¹ National Center for Tumor Diseases (NCT), Partner Site Dresden, German Cancer Research Center Heidelberg, Dresden, Germany.
¹² German Cancer Consortium (DKTK) and German Cancer Research Center (DKFZ), Heidelberg, Germany.
¹³ Institute of Pathology, University Medical Center Mainz, Mainz, Germany.
¹⁴ Division of Public Health Sciences, Fred Hutchinson Cancer Center, Seattle, WA, USA.
¹⁵ Department of Epidemiology, University of Washington, Seattle, WA, USA.
¹⁶ Medical Oncology, National Center for Tumor Diseases (NCT), University Hospital Heidelberg, Heidelberg, Germany.
¹⁷ Pathology & Data Analytics, Leeds Institute of Medical Research at St James's, University of Leeds, Leeds, United Kingdom.

PMID: 39973981
PMCID: PMC11838662
DOI: 10.1101/2025.02.04.25321660

Assessing Genotype-Phenotype Correlations with Deep Learning in Colorectal Cancer: A Multi-Centric Study

Marco Gustav et al. medRxiv. 2025.

[Preprint]. 2025 Feb 8:2025.02.04.25321660.

doi: 10.1101/2025.02.04.25321660.

Authors

Affiliations

¹ Else Kroener Fresenius Center for Digital Health, Faculty of Medicine and University Hospital Carl Gustav Carus, TUD Dresden University of Technology, 01307 Dresden, Germany.
² Pathology, Faculty of Medicine, University of Augsburg, Augsburg, Germany.
³ Department of Medicine I, Faculty of Medicine and University Hospital Carl Gustav Carus, TUD Dresden University of Technology, 01307 Dresden, Germany.
⁴ Division of Gastroenterology and Hepatology, Mayo Clinic, Rochester, Minnesota, USA.
⁵ Division of Laboratory Genetics, Department of Laboratory Medicine and Pathology, Mayo Clinic, Rochester, Minnesota, USA.
⁶ Department of Quantitative Health Sciences, Division of Epidemiology, Mayo Clinic, Rochester, Minnesota, USA.
⁷ Center for Cancer Research, Medical University of Vienna, Vienna, Austria.
⁸ Nutrition and Metabolism Branch, International Agency for Research on Cancer, World Health Organization, Lyon, France.
⁹ Cancer Epidemiology and Prevention Research Unit, School of Public Health, Imperial College London, London, United Kingdom.
¹⁰ Institute of Pathology, University Hospital Carl Gustav Carus (UKD), Technical University Dresden (TUD), Dresden, Germany.
¹¹ National Center for Tumor Diseases (NCT), Partner Site Dresden, German Cancer Research Center Heidelberg, Dresden, Germany.
¹² German Cancer Consortium (DKTK) and German Cancer Research Center (DKFZ), Heidelberg, Germany.
¹³ Institute of Pathology, University Medical Center Mainz, Mainz, Germany.
¹⁴ Division of Public Health Sciences, Fred Hutchinson Cancer Center, Seattle, WA, USA.
¹⁵ Department of Epidemiology, University of Washington, Seattle, WA, USA.
¹⁶ Medical Oncology, National Center for Tumor Diseases (NCT), University Hospital Heidelberg, Heidelberg, Germany.
¹⁷ Pathology & Data Analytics, Leeds Institute of Medical Research at St James's, University of Leeds, Leeds, United Kingdom.

PMID: 39973981
PMCID: PMC11838662
DOI: 10.1101/2025.02.04.25321660

Update in

Assessing genotype-phenotype correlations in colorectal cancer with deep learning: a multicentre cohort study.
Gustav M, van Treeck M, Reitsam NG, Carrero ZI, Loeffler CML, Rabasco Meneghetti A, Märkl B, Boardman LA, French AJ, Goode EL, Gsur A, Brezina S, Gunter MJ, Murphy N, Hönscheid P, Sperling C, Foersch S, Steinfelder R, Harrison T, Peters U, Phipps A, Kather JN. Gustav M, et al. Lancet Digit Health. 2025 Aug;7(8):100891. doi: 10.1016/j.landig.2025.100891. Epub 2025 Aug 19. Lancet Digit Health. 2025. PMID: 40829965

Abstract

Background: Deep Learning (DL) has emerged as a powerful tool to predict genetic biomarkers directly from digitized Hematoxylin and Eosin (H&E) slides in colorectal cancer (CRC). However, few studies have systematically investigated the predictability of biomarkers beyond routinely available alterations such as microsatellite instability (MSI), and BRAF and KRAS mutations.

Methods: Our primary dataset comprised H&E slides of CRC tumors across five cohorts totaling 1,376 patients who underwent comprehensive panel sequencing, with an additional 536 patients from two public datasets for validation. We developed a DL model using a single transformer model to predict multiple genetic alterations directly from the slides. The model's performance was compared against conventional single-target models, and potential confounders were analyzed.

Findings: The multi-target model was able to predict numerous biomarkers from pathology slides, matching and partly exceeding single-target transformers. The Area Under the Receiver Operating Characteristic curve (AUROC, mean ± std) on the primary external validation cohorts was: BRAF (0·78 ± 0·01), hypermutation (0·88 ± 0·01), MSI (0·93 ± 0·01), RNF43 (0·86 ± 0·01); this biomarker predictability was mirrored across metrics and co-occurrence analyses. However, biomarkers with high AUROCs largely correlated with MSI, with model predictions depending considerably on MSI-associated morphology upon pathological examination.

Interpretation: Our study demonstrates that multi-target transformers can predict the biomarker status for numerous genetic alterations in CRC directly from H&E slides. However, their predictability is mainly associated with MSI phenotype, despite indications of slight biomarker-inherent contributions to a phenotype. Our findings underscore the need to analyze confounders in AI-based oncology biomarkers. To enable this, we developed a validated model applicable to other cancers and larger, diverse datasets.

Funding: The German Federal Ministry of Health, the Max-Eder-Programme of German Cancer Aid, the German Federal Ministry of Education and Research, the German Academic Exchange Service, and the EU.

PubMed Disclaimer

Conflict of interest statement

Declaration of interest JNK declares consulting services for Bioptimus, France; Owkin, France; DoMore Diagnostics, Norway; Panakeia, UK; AstraZeneca, UK; Mindpeak, Germany; and MultiplexDx, Slovakia. Furthermore, he holds shares in StratifAI GmbH, Germany, Synagen GmbH, Germany; has received a research grant by GSK; and has received honoraria by AstraZeneca, Bayer, Daiichi Sankyo, Eisai, Janssen, Merck, MSD, BMS, Roche, Pfizer, and Fresenius. MG has received honoraria for lectures sponsored by Techniker Krankenkasse (TK) and AstraZeneca. SF has received honoraria for lectures by BMS and MSD. UP declares consulting services for AbbVie and her husband is holding individual stocks for the following companies: BioNTech SE – ADR, Amazon, CureVac BV, NanoString Technologies, Google/Alphabet Inc Class C, NVIDIA Corp, Microsoft Corp.. No other potential conflicts of interest are reported by any of the authors.

Figures

**Fig. 1:. Experimental design, cohort characterization, and schematic for predictive analysis.**
A. Tissue samples from colorectal cancer (CRC) patients across five independent cohorts were obtained via surgical resection, with associated demographic, clinical, and sequencing data collected. Upon Hematoxylin and Eosin (H&E) staining, tumor tissues are digitized into Whole Slide Images (WSIs) for profiling genetic alterations. The WSIs are then used to train and test a deep learning (DL) algorithm for biomarker detection, to simultaneously predict multiple mutational statuses and provide heatmap explanations. B. The DL pipeline tessellates the WSIs into smaller tiles while rejecting background and blurry areas, extracting n feature vectors from n tiles. Feature vectors are compressed and processed in a multi-target transformer, employing an attention mechanism in an encoder-decoder structure for class token learning. The transformer generates individual scores for the respective amount of classes per target. The code is able to comprise positional tile embedding (dashed lines), which did not result in improved performance and were therefore excluded from our study. C. Overview of the five GECCO and two public cohorts, including patient numbers, slides, extracted features, and MSI case proportions. The cohorts are divided into train datasets and test datasets. D. Schematic for interpreting result plots and statistics, delineating dataset partitioning based on microsatellite (MSS: microsatellite stability, MSI: microsatellite instability) and gene mutational status (MUT: mutated, WT: wild type). The diagram illustrates distinct groups by color, with the left side representing MSI prediction scores and the right side for prediction target scores. True ground truth labels of samples guide the group organization, with model-generated scores depicted in corresponding colors.

**Fig. 2:. Analysis of genetic alterations co-occurrence in CRC for GECCO cohorts.**
Hierarchical clustering analysis was conducted on the ground truth of genetic alterations with fully available mutational information. Each row corresponds to a genetic alteration, and each column represents a patient from the dataset. The top row indicates the distribution of patients from various cohorts within genetic clusters. The distance calculation was performed using the ‘Euclidean’ metric, and the ‘Ward’ method was applied to clustering. Three unique genetic clusters were created and marked. The patient clustering shows a diverse distribution of samples across all five cohorts and genetic clusters (top row).

**Fig. 3:. Evaluation of the performance of the Multi-Target Transformer on selected prediction targets for the external cohorts from GECCO.**
A. The comparison of Single-Target Transformer versus Multi-Target Transformers shows the Area Under the Receiver Operating Characteristic curve (AUROC) from each of the 7 folds of external cross-validation, with the median value highlighted with a horizontal line in each box. The figure includes selected representative potential biomarkers of genetic alterations associated with MSS (Fig. 2, genetic Cluster 1) and MSI (Fig. 2, genetic Cluster 2). The test set cohorts consist of CRA and WHI (Fig. 1C). The horizontal line positioned at an AUROC of 0·50 represents a random guess of the model. Significance was determined through a two-sided DeLong test with a p-value threshold of less than 0·05. B. Performance metrics of Multi-Target Transformers for external validation. The mean (center of dot) and standard deviation (diameter of dot) for relevant selected prediction targets for the whole external set, as well as the MSI and MSS subgroups, are displayed based on the 7 folds of cross-validation. The threshold for binary classification is pre-defined as 0·50· The evaluation metrics include the Area Under the Receiver Operating Characteristic Curve (AUROC), and the Area Under the Precision-Recall Curve (AUPRC), along with the corresponding mutation rates in external cohorts. The Mutation Rate refers to the fraction of instances with a specific mutation in the subgroup. The MSI & Genetic Alteration Co-Occurrence Ratio is the fraction of cases harboring MSI among all cases with a particular genetic mutation. The data is sorted for AUROC and shown in Tab. S14 and Tab. S17–S18. An extended version of this panel with more metrics is shown in Fig. S3B. C. Distribution of Areas under the Receiver Operating Characteristic Curve (AUROCs, mean ± standard deviation) for selected prediction targets and their co-occurrence with MSI with corresponding values and further metrics shown in Tab. S14. An extended version of this panel with MSS/MSI-subgroup specific AUROCs is shown in Fig. S3C.

**Fig. 4:. Evaluation of prediction scores based on the multi-target transformer in external validation on the GECCO test set subgrouped by the co-occurrence of the prediction targets with MSI.**
**A.-B.** Violin plots representing individual patient scores from the test set cohorts for MSI and representative genetic alterations in four subgroups based on microsatellite and alteration mutational status. The left y-axis represents the MSI score scale (left violin halfs) and the right y-axis corresponds to the prediction target scores (right violin halfs). The legend displays gray horizontal lines in the concept violins that represent the optimal position of the prediction scores based on ground truth. The selection of prediction targets includes *TP53*, *APC*, and *KRAS* from genetic Cluster 1 (A.), and *BMPR2*, *ZNRF3*, Hypermutation (HM), *RNF43*, and *BRAF* from genetic Cluster 2 (B.) (Fig. 2). The data encompasses both external cohorts CRA and WHI (Fig. 1C). Each dot represents the mean value of individual patient prediction scores calculated from 7 folds, with the horizontal line on each side of the violin indicating the median of all individual mean patient scores. A horizontal line at 0·50 denotes the line of model uncertainty. The sample count for each subgroup is indicated below the violins. Statistical significance is denoted in the figures as follows: * for p < 0·05, ** for p < 0·01, *** for p < 0·001, with more details provided in Fig. 1D. After testing for normal distribution (Tab. S20), the Mann-Whitney U test was used for within-group comparisons, and the Wilcoxon test was used for between-group comparisons. Abbreviations: HM: Hypermutation; MSI: Microsatellite instability; MSS: Microsatellite stability; MUT: Mutated; WT: Wild type.

**Fig. 5:. Heatmaps of representative samples for prediction of MSI, *KRAS, BRAF* and Hypermutation (HM) from the external GECCO validation dataset.**
The heatmaps are derived from the model with the median AUROC for MSI detection and the majority of prediction targets evaluated by sevenfold cross-validation. The cohort, Sample-ID, ground truth and prediction scores for MSI, along with the individual mutational status of the target, a brief pathological evaluation and magnified views of specific areas are provided for in-depth analysis. The heatmaps indicate relevant areas for the various predictions. The red areas are of high importance and indicate a mutant type (MUT), while the blue areas are of low importance and indicate a wild type (WT). The color intensity showcases the model’s attention to that distinct area. A. The tumor exhibits both gland-forming and more solid components and extremely high numbers of tumor-infiltrating lymphocytes (TILs) with dense lymphoid aggregates. The pathological examination confirms the plausibility of a high MSI score indicating MSI which is also the ground truth. A low *KRAS* score indicates *KRAS* WT but the ground truth is *KRAS* MUT. The heatmap highlights similar tumor areas but with diverging scores: where MSI map is red indicating high score, *KRAS* map is blue indicating low score. B. The presence of mucinous differentiation in a MSI, *BRAF* WT case results in high MSI and *BRAF* scores. The MSI score is pathologically plausible whereas the *BRAF* score indicates a contrary prediction tendency than the ground truth holds. For both predictions, the model focuses on similar tumor areas with similar scores indicating MSI/*BRAF* MUT. C. Partly mucinous morphology indicates the possibility of MSI, with a high score predicting MSI. HM is also predicted MUT with a high score, even though HM is WT for this sample. Both heatmaps primarily label the tumor and the same region with comparable significance. D. Villous adenoma with high grade dysplasia is a common precursor lesion associated with high frequency of *KRAS* mutations ^, The heatmaps highlight similar large scale tumor areas but with converging scores: where the MSI map is red indicating a high score, the *KRAS* map is blue indicating low score. E. The tumor area appears to be mainly MSS, and the heatmap predicts a low score, indicating this. Although it is being mutated in the ground truth, it is still predicted as non-hypermutated. This is a rare MSS case with HM Both heatmaps predominantly mark the tumor area and the same region with comparable relevance. Abbreviations: HM: Hypermutation; MSI: Microsatellite instability; MSS: Microsatellite stability; MUT: Mutated; WT: Wild type; w/: with

**Fig. 6:. Top tiles for prediction of genetic alterations (left column) and MSI (right column) for two selected slides from the GECCO test set.**
**A.-B.** WHI, 1031792: Medullary carcinoma with sheets of tumor cells, low stroma content, high number of tumor-infiltrating lymphocytes in a *KRAS* WT and MSI case, leading to high MSI prediction scores (B.) and low *KRAS* MUT prediction scores (A.). *KRAS* mutations show a lower frequency in MSI CRCs. Medullary carcinoma is a key morphological feature of MSI CRCs. **C.-D.** WHI 1031557: Top tiles for *BRAF* MUT as well as MSI prediction, both predictions with high prediction scores, both displaying a mixed morphology with partly medullary, partly mucinous, partly gland-forming histology, and high number of tumor-infiltrating/associated lymphocytes. Medullary growth pattern with lymphocytic infiltration and mucinous differentiation are typical features of a MSI-like morphology. Accordingly, the case was correctly predicted as MSI with a really high prediction score (D.). As *BRAF* MUT and MSI often co-occur and share morphologic overlap, the case was misclassified with regards to *BRAF*-status, resulting in high prediction scores for *BRAF* MUT (C.), even though the ground truth was *BRAF* WT. Abbreviations: CRC: Colorectal cancer; MSI: Microsatellite instability; MSS: Microsatellite stability; MUT: Mutated; WT: Wild type.

See this image and copyright information in PMC

References

1. Tsimberidou AM, Fountzilas E, Nikanjam M, Kurzrock R. Review of precision cancer medicine: Evolution of the treatment paradigm. Cancer Treat Rev. 2020. Jun;86:102019. - PMC - PubMed
1. Xiao W, Ren L, Chen Z, Fang LT, Zhao Y, Lack J, et al. Toward best practice in cancer mutation detection with whole-genome and whole-exome sequencing. Nat Biotechnol. 2021. Sep;39(9):1141–50. - PMC - PubMed
1. Phillips KA, Douglas MP, Wordsworth S, Buchanan J, Marshall DA. Availability and funding of clinical genomic sequencing globally. BMJ Glob Health [Internet]. 2021. Feb;6(2). Available from: 10.1136/bmjgh-2020-004415 - DOI - PMC - PubMed
1. Kather JN, Pearson AT, Halama N, Jäger D, Krause J, Loosen SH, et al. Deep learning can predict microsatellite instability directly from histology in gastrointestinal cancer. Nat Med. 2019. Jul;25(7):1054–6. - PMC - PubMed
1. Echle A, Grabsch HI, Quirke P, van den Brandt PA, West NP, Hutchins GGA, et al. Clinical-Grade Detection of Microsatellite Instability in Colorectal Tumors by Deep Learning. Gastroenterology. 2020. Oct;159(4):1406–16.e11. - PMC - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

This is a preprint.

Assessing Genotype-Phenotype Correlations with Deep Learning in Colorectal Cancer: A Multi-Centric Study

Affiliations

Assessing Genotype-Phenotype Correlations with Deep Learning in Colorectal Cancer: A Multi-Centric Study

Authors

Affiliations

Update in

Abstract

Conflict of interest statement

Figures

References

Publication types

Grants and funding

LinkOut - more resources

Full Text Sources

Research Materials

Miscellaneous