Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2025 Feb 8:2025.02.04.25321660.
doi: 10.1101/2025.02.04.25321660.

Assessing Genotype-Phenotype Correlations with Deep Learning in Colorectal Cancer: A Multi-Centric Study

Affiliations

Assessing Genotype-Phenotype Correlations with Deep Learning in Colorectal Cancer: A Multi-Centric Study

Marco Gustav et al. medRxiv. .

Update in

Abstract

Background: Deep Learning (DL) has emerged as a powerful tool to predict genetic biomarkers directly from digitized Hematoxylin and Eosin (H&E) slides in colorectal cancer (CRC). However, few studies have systematically investigated the predictability of biomarkers beyond routinely available alterations such as microsatellite instability (MSI), and BRAF and KRAS mutations.

Methods: Our primary dataset comprised H&E slides of CRC tumors across five cohorts totaling 1,376 patients who underwent comprehensive panel sequencing, with an additional 536 patients from two public datasets for validation. We developed a DL model using a single transformer model to predict multiple genetic alterations directly from the slides. The model's performance was compared against conventional single-target models, and potential confounders were analyzed.

Findings: The multi-target model was able to predict numerous biomarkers from pathology slides, matching and partly exceeding single-target transformers. The Area Under the Receiver Operating Characteristic curve (AUROC, mean ± std) on the primary external validation cohorts was: BRAF (0·78 ± 0·01), hypermutation (0·88 ± 0·01), MSI (0·93 ± 0·01), RNF43 (0·86 ± 0·01); this biomarker predictability was mirrored across metrics and co-occurrence analyses. However, biomarkers with high AUROCs largely correlated with MSI, with model predictions depending considerably on MSI-associated morphology upon pathological examination.

Interpretation: Our study demonstrates that multi-target transformers can predict the biomarker status for numerous genetic alterations in CRC directly from H&E slides. However, their predictability is mainly associated with MSI phenotype, despite indications of slight biomarker-inherent contributions to a phenotype. Our findings underscore the need to analyze confounders in AI-based oncology biomarkers. To enable this, we developed a validated model applicable to other cancers and larger, diverse datasets.

Funding: The German Federal Ministry of Health, the Max-Eder-Programme of German Cancer Aid, the German Federal Ministry of Education and Research, the German Academic Exchange Service, and the EU.

PubMed Disclaimer

Conflict of interest statement

Declaration of interest JNK declares consulting services for Bioptimus, France; Owkin, France; DoMore Diagnostics, Norway; Panakeia, UK; AstraZeneca, UK; Mindpeak, Germany; and MultiplexDx, Slovakia. Furthermore, he holds shares in StratifAI GmbH, Germany, Synagen GmbH, Germany; has received a research grant by GSK; and has received honoraria by AstraZeneca, Bayer, Daiichi Sankyo, Eisai, Janssen, Merck, MSD, BMS, Roche, Pfizer, and Fresenius. MG has received honoraria for lectures sponsored by Techniker Krankenkasse (TK) and AstraZeneca. SF has received honoraria for lectures by BMS and MSD. UP declares consulting services for AbbVie and her husband is holding individual stocks for the following companies: BioNTech SE – ADR, Amazon, CureVac BV, NanoString Technologies, Google/Alphabet Inc Class C, NVIDIA Corp, Microsoft Corp.. No other potential conflicts of interest are reported by any of the authors.

Figures

Fig. 1:
Fig. 1:. Experimental design, cohort characterization, and schematic for predictive analysis.
A. Tissue samples from colorectal cancer (CRC) patients across five independent cohorts were obtained via surgical resection, with associated demographic, clinical, and sequencing data collected. Upon Hematoxylin and Eosin (H&E) staining, tumor tissues are digitized into Whole Slide Images (WSIs) for profiling genetic alterations. The WSIs are then used to train and test a deep learning (DL) algorithm for biomarker detection, to simultaneously predict multiple mutational statuses and provide heatmap explanations. B. The DL pipeline tessellates the WSIs into smaller tiles while rejecting background and blurry areas, extracting n feature vectors from n tiles. Feature vectors are compressed and processed in a multi-target transformer, employing an attention mechanism in an encoder-decoder structure for class token learning. The transformer generates individual scores for the respective amount of classes per target. The code is able to comprise positional tile embedding (dashed lines), which did not result in improved performance and were therefore excluded from our study. C. Overview of the five GECCO and two public cohorts, including patient numbers, slides, extracted features, and MSI case proportions. The cohorts are divided into train datasets and test datasets. D. Schematic for interpreting result plots and statistics, delineating dataset partitioning based on microsatellite (MSS: microsatellite stability, MSI: microsatellite instability) and gene mutational status (MUT: mutated, WT: wild type). The diagram illustrates distinct groups by color, with the left side representing MSI prediction scores and the right side for prediction target scores. True ground truth labels of samples guide the group organization, with model-generated scores depicted in corresponding colors.
Fig. 2:
Fig. 2:. Analysis of genetic alterations co-occurrence in CRC for GECCO cohorts.
Hierarchical clustering analysis was conducted on the ground truth of genetic alterations with fully available mutational information. Each row corresponds to a genetic alteration, and each column represents a patient from the dataset. The top row indicates the distribution of patients from various cohorts within genetic clusters. The distance calculation was performed using the ‘Euclidean’ metric, and the ‘Ward’ method was applied to clustering. Three unique genetic clusters were created and marked. The patient clustering shows a diverse distribution of samples across all five cohorts and genetic clusters (top row).
Fig. 3:
Fig. 3:. Evaluation of the performance of the Multi-Target Transformer on selected prediction targets for the external cohorts from GECCO.
A. The comparison of Single-Target Transformer versus Multi-Target Transformers shows the Area Under the Receiver Operating Characteristic curve (AUROC) from each of the 7 folds of external cross-validation, with the median value highlighted with a horizontal line in each box. The figure includes selected representative potential biomarkers of genetic alterations associated with MSS (Fig. 2, genetic Cluster 1) and MSI (Fig. 2, genetic Cluster 2). The test set cohorts consist of CRA and WHI (Fig. 1C). The horizontal line positioned at an AUROC of 0·50 represents a random guess of the model. Significance was determined through a two-sided DeLong test with a p-value threshold of less than 0·05. B. Performance metrics of Multi-Target Transformers for external validation. The mean (center of dot) and standard deviation (diameter of dot) for relevant selected prediction targets for the whole external set, as well as the MSI and MSS subgroups, are displayed based on the 7 folds of cross-validation. The threshold for binary classification is pre-defined as 0·50· The evaluation metrics include the Area Under the Receiver Operating Characteristic Curve (AUROC), and the Area Under the Precision-Recall Curve (AUPRC), along with the corresponding mutation rates in external cohorts. The Mutation Rate refers to the fraction of instances with a specific mutation in the subgroup. The MSI & Genetic Alteration Co-Occurrence Ratio is the fraction of cases harboring MSI among all cases with a particular genetic mutation. The data is sorted for AUROC and shown in Tab. S14 and Tab. S17–S18. An extended version of this panel with more metrics is shown in Fig. S3B. C. Distribution of Areas under the Receiver Operating Characteristic Curve (AUROCs, mean ± standard deviation) for selected prediction targets and their co-occurrence with MSI with corresponding values and further metrics shown in Tab. S14. An extended version of this panel with MSS/MSI-subgroup specific AUROCs is shown in Fig. S3C.
Fig. 4:
Fig. 4:. Evaluation of prediction scores based on the multi-target transformer in external validation on the GECCO test set subgrouped by the co-occurrence of the prediction targets with MSI.
A.-B. Violin plots representing individual patient scores from the test set cohorts for MSI and representative genetic alterations in four subgroups based on microsatellite and alteration mutational status. The left y-axis represents the MSI score scale (left violin halfs) and the right y-axis corresponds to the prediction target scores (right violin halfs). The legend displays gray horizontal lines in the concept violins that represent the optimal position of the prediction scores based on ground truth. The selection of prediction targets includes TP53, APC, and KRAS from genetic Cluster 1 (A.), and BMPR2, ZNRF3, Hypermutation (HM), RNF43, and BRAF from genetic Cluster 2 (B.) (Fig. 2). The data encompasses both external cohorts CRA and WHI (Fig. 1C). Each dot represents the mean value of individual patient prediction scores calculated from 7 folds, with the horizontal line on each side of the violin indicating the median of all individual mean patient scores. A horizontal line at 0·50 denotes the line of model uncertainty. The sample count for each subgroup is indicated below the violins. Statistical significance is denoted in the figures as follows: * for p < 0·05, ** for p < 0·01, *** for p < 0·001, with more details provided in Fig. 1D. After testing for normal distribution (Tab. S20), the Mann-Whitney U test was used for within-group comparisons, and the Wilcoxon test was used for between-group comparisons. Abbreviations: HM: Hypermutation; MSI: Microsatellite instability; MSS: Microsatellite stability; MUT: Mutated; WT: Wild type.
Fig. 5:
Fig. 5:. Heatmaps of representative samples for prediction of MSI, KRAS, BRAF and Hypermutation (HM) from the external GECCO validation dataset.
The heatmaps are derived from the model with the median AUROC for MSI detection and the majority of prediction targets evaluated by sevenfold cross-validation. The cohort, Sample-ID, ground truth and prediction scores for MSI, along with the individual mutational status of the target, a brief pathological evaluation and magnified views of specific areas are provided for in-depth analysis. The heatmaps indicate relevant areas for the various predictions. The red areas are of high importance and indicate a mutant type (MUT), while the blue areas are of low importance and indicate a wild type (WT). The color intensity showcases the model’s attention to that distinct area. A. The tumor exhibits both gland-forming and more solid components and extremely high numbers of tumor-infiltrating lymphocytes (TILs) with dense lymphoid aggregates. The pathological examination confirms the plausibility of a high MSI score indicating MSI which is also the ground truth. A low KRAS score indicates KRAS WT but the ground truth is KRAS MUT. The heatmap highlights similar tumor areas but with diverging scores: where MSI map is red indicating high score, KRAS map is blue indicating low score. B. The presence of mucinous differentiation in a MSI, BRAF WT case results in high MSI and BRAF scores. The MSI score is pathologically plausible whereas the BRAF score indicates a contrary prediction tendency than the ground truth holds. For both predictions, the model focuses on similar tumor areas with similar scores indicating MSI/BRAF MUT. C. Partly mucinous morphology indicates the possibility of MSI, with a high score predicting MSI. HM is also predicted MUT with a high score, even though HM is WT for this sample. Both heatmaps primarily label the tumor and the same region with comparable significance. D. Villous adenoma with high grade dysplasia is a common precursor lesion associated with high frequency of KRAS mutations , The heatmaps highlight similar large scale tumor areas but with converging scores: where the MSI map is red indicating a high score, the KRAS map is blue indicating low score. E. The tumor area appears to be mainly MSS, and the heatmap predicts a low score, indicating this. Although it is being mutated in the ground truth, it is still predicted as non-hypermutated. This is a rare MSS case with HM Both heatmaps predominantly mark the tumor area and the same region with comparable relevance. Abbreviations: HM: Hypermutation; MSI: Microsatellite instability; MSS: Microsatellite stability; MUT: Mutated; WT: Wild type; w/: with
Fig. 6:
Fig. 6:. Top tiles for prediction of genetic alterations (left column) and MSI (right column) for two selected slides from the GECCO test set.
A.-B. WHI, 1031792: Medullary carcinoma with sheets of tumor cells, low stroma content, high number of tumor-infiltrating lymphocytes in a KRAS WT and MSI case, leading to high MSI prediction scores (B.) and low KRAS MUT prediction scores (A.). KRAS mutations show a lower frequency in MSI CRCs. Medullary carcinoma is a key morphological feature of MSI CRCs. C.-D. WHI 1031557: Top tiles for BRAF MUT as well as MSI prediction, both predictions with high prediction scores, both displaying a mixed morphology with partly medullary, partly mucinous, partly gland-forming histology, and high number of tumor-infiltrating/associated lymphocytes. Medullary growth pattern with lymphocytic infiltration and mucinous differentiation are typical features of a MSI-like morphology. Accordingly, the case was correctly predicted as MSI with a really high prediction score (D.). As BRAF MUT and MSI often co-occur and share morphologic overlap, the case was misclassified with regards to BRAF-status, resulting in high prediction scores for BRAF MUT (C.), even though the ground truth was BRAF WT. Abbreviations: CRC: Colorectal cancer; MSI: Microsatellite instability; MSS: Microsatellite stability; MUT: Mutated; WT: Wild type.

References

    1. Tsimberidou AM, Fountzilas E, Nikanjam M, Kurzrock R. Review of precision cancer medicine: Evolution of the treatment paradigm. Cancer Treat Rev. 2020. Jun;86:102019. - PMC - PubMed
    1. Xiao W, Ren L, Chen Z, Fang LT, Zhao Y, Lack J, et al. Toward best practice in cancer mutation detection with whole-genome and whole-exome sequencing. Nat Biotechnol. 2021. Sep;39(9):1141–50. - PMC - PubMed
    1. Phillips KA, Douglas MP, Wordsworth S, Buchanan J, Marshall DA. Availability and funding of clinical genomic sequencing globally. BMJ Glob Health [Internet]. 2021. Feb;6(2). Available from: 10.1136/bmjgh-2020-004415 - DOI - PMC - PubMed
    1. Kather JN, Pearson AT, Halama N, Jäger D, Krause J, Loosen SH, et al. Deep learning can predict microsatellite instability directly from histology in gastrointestinal cancer. Nat Med. 2019. Jul;25(7):1054–6. - PMC - PubMed
    1. Echle A, Grabsch HI, Quirke P, van den Brandt PA, West NP, Hutchins GGA, et al. Clinical-Grade Detection of Microsatellite Instability in Colorectal Tumors by Deep Learning. Gastroenterology. 2020. Oct;159(4):1406–16.e11. - PMC - PubMed

Publication types