Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Mar 8;16(1):2328.
doi: 10.1038/s41467-025-57541-y.

Self-supervised learning reveals clinically relevant histomorphological patterns for therapeutic strategies in colon cancer

Affiliations

Self-supervised learning reveals clinically relevant histomorphological patterns for therapeutic strategies in colon cancer

Bojing Liu et al. Nat Commun. .

Abstract

Self-supervised learning (SSL) automates the extraction and interpretation of histopathology features on unannotated hematoxylin-eosin-stained whole slide images (WSIs). We train an SSL Barlow Twins encoder on 435 colon adenocarcinoma WSIs from The Cancer Genome Atlas to extract features from small image patches (tiles). Leiden community detection groups tiles into histomorphological phenotype clusters (HPCs). HPC reproducibility and predictive ability for overall survival are confirmed in an independent clinical trial (N = 1213 WSIs). This unbiased atlas results in 47 HPCs displaying unique and shared clinically significant histomorphological traits, highlighting tissue type, quantity, and architecture, especially in the context of tumor stroma. Through in-depth analyses of these HPCs, including immune landscape and gene set enrichment analyses, and associations to clinical outcomes, we shine light on the factors influencing survival and responses to treatments of standard adjuvant chemotherapy and experimental therapies. Further exploration of HPCs may unveil additional insights and aid decision-making and personalized treatments for colon cancer patients.

PubMed Disclaimer

Conflict of interest statement

Competing interests: AT is a co-founder of Imagenomix. The remaining authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Overview of the Model Architecture: Training Barlow Twins and deriving Histomorphological Phenotype Clusters.
a Training Barlow Twins with TCGA. WSIs from TCGA were processed to extract image tiles and normalize stain colors. The Barlow Twins network was employed to learn 128-dimensional z vectors from a random sample (N = 250,000 image tiles) of these image tiles. b Deriving HPCs. The tiles from TCGA were projected into z vector representations obtained from the trained Barlow Twins network. HPCs were defined by applying Leiden community detection to the nearest neighbor graph of z tile vector representations. Each WSI was represented by a compositional vector of the derived HPCs, indicating the percentage of each HPC with respect to the total tissue area. The Barlow Twins model and HPCs were then projected and integrated into the external AVANT trial. c Whole Slide Image Representation. The compositional HPC data represented the WSIs in the study. AVANT, Bevacizumab-Avastin® adjuVANT trial. HPC, histomorphological phenotype cluster. TCGA, The Cancer Genome Atlas. WSI, whole slide image. Source data are provided as a Source Data file.
Fig. 2
Fig. 2. Identification of HPCs in TCGA and subsequent classification into superclusters.
a UMAP showing 47 HPCs identified from the TCGA dataset, each scatter representing an image tile. b PAGA plot of HPCs. Each node represented an HPC with edges representing connections between HPCs based on their vector representation similarity. The pie chart of each node represented the tissue composition for each HPC. c Grouping of HPCs into super-clusters according to histopathology tissue similarities. Representative tiles for each HPC were labeled with ID and a brief description. The image tiles (224-by-224 pixels), at a magnification level of 10x (pixel size approximate 1.0 um), correspond to 224 mm in size (see scale bar in c). HPC, histomorphological phenotype cluster. PAGA, partition-based abstraction graph. TCGA, The Cancer Genome Atlas. UMAP, uniform manifold approximation and projection plot. Source data are provided as a Source Data file.
Fig. 3
Fig. 3. Verification of HPCs in the TCGA training set and the external clinical AVANT trial.
a–i Example tiles from TCGA (upper row) and AVANT (lower row) showcase the eight super-clusters with a zoomed-in representative tile. The image tiles (224-by-224 pixels), at a magnification level of 10x (pixel size approximate 1.0 um), correspond to 224 mm in size (see scale bar in ai). The muscle tissue super-cluster is further divided into longitudinal and axial subgroups. j, k Stacked bar plots illustrate instances of misclassification for each HPC in TCGA training set and AVANT external test set. Green bars represent the percentage of correctly identified odd clusters, yellow bars indicate misclassifications within the tested HPC’s super-cluster, orange bars show misclassifications outside the super-cluster. l Box plots display similar distributions of success test rates (corresponding to the green bars in panels j and k) for HPCs in TCGA and AVANT cohorts (two-sided Wilcoxon signed-rank test, p = 0.534). Each blue point within each box plot represents the success test rate for a single HPC, calculated based on 50 tests per HPC. The central orange line within each box represents the median, while the bounds of the box indicate the 25th and 75th percentiles (interquartile range). Whiskers extend to the minima and maxima within 1.5 times the interquartile range. Points outside the whiskers represent outliers. HPC, histomorphological phenotype cluster. TCGA, The Cancer Genome Atlas. AVANT, Bevacizumab-Avastin® adjuVANT trial. Source data are provided as a Source Data file.
Fig. 4
Fig. 4. HPC-based classifier was associated with OS in patients treated with standard-of-care and AVANT-experimental treatment.
a Ordinary Cox regression for OS, incorporating the HPC-based risk classifier, along with sex, age categories, tumor-stroma ratio, and AJCC TNM staging, was conducted within the external test set of the AVANT control group (N = 379 patients after excluding those with missing clinical information). Each point represents the point estimate of HR, and the horizontal whiskers depict the 95% CI. The HPC model-based classifier stands as an independent prognostic factor (HR = 2.50, 95% CI = 1.18–5.31) for OS. b Ordinary Cox regression for OS, incorporating the HPC-based risk classifier, along with sex, age categories, tumor-stroma ratio, and AJCC TNM staging, was conducted within the AVANT experimental group (N = 751 patients after excluding those with missing clinical information). Each point represents the point estimate of HR, and the horizontal whiskers depict the 95% CI. The HPC model-based classifier stands as an independent prognostic factor (HR = 1.82, 95% CI = 1.11–2.99) for OS. c and d The SHAP summary plots depict the relationship between the center-log-transformed compositional value of an HPC and its impact on death hazard prediction. Statistics estimated from AVANT control (N = 405 after excluding those with missing survival data) (c) and experimental groups (N = 780 patients after excluding those with missing survival data) (d). The color bar indicates the relative compositional value of an HPC, with red indicating higher and blue indicating lower composition. Higher compositions of the top 10 HPCs were associated with worse OS, while higher compositions of the bottom 10 HPCs were linked to improved OS. AJCC TNM, American Joint Committee on Cancer tumor-node-metastasis classification. AVANT, Bevacizumab-Avastin® adjuVANT trial. HPC, histomorphological phenotype cluster. OS, overall survival. SHAP, SHapley Additive exPlanations. TCGA, The Cancer Genome Atlas. Source data are provided as a Source Data file.
Fig. 5
Fig. 5. PAGA plots highlighted with important HPCs related to OS in the standard-of-care and experimental treated group.
a Standard treated group: HPCs colored in the red are linked to worse survival and HPCs colored in blue are linked to better survival. b AVANTexperimental treated group: HPCs colored in the red are linked to worse survival and HPCs colored in blue are linked to better survival. The image tiles (224-by-224 pixels), at a magnification level of 10x (pixel size approximate 1.0 um), correspond to 224 mm in size (see scale bar in a and b). AVANT, Bevacizumab-Avastin® adjuVANT trial. HPC, histomorphological phenotype cluster. PAGA, partition-based graph abstraction. Source data are provided as a Source Data file.
Fig. 6
Fig. 6. Survival-associated HPCs in relation to immune and genetic profile.
a Standard-of-care group: Spearman’s correlations between top 20 OS-related HPCs and immune landscape features. HPCs (columns of the matrix) were colored according to the beta-coefficients estimated from the optimized regularised Cox regression, with red indicating HPCs related to worse survival and green indicating HPCs related to better survival. The color bar at the upper left corner indicates the value of correlation coefficients with red denoting positive and blue denoting negative correlations. b AVANT-experimental treated group: Spearman’s correlations between top 20 OS-related HPCs and immune landscape features. c Standard-of-care group GSEA between the top OS-related HPCs and major cancer hallmark pathways. HPCs (columns of the matrix) were colored according to the beta coefficients estimated from the optimized regularized Cox regression, with red indicating HPCs related to worse survival and green indicating HPCs related to better survival. The color bar at the upper left corner indicates the value of the correlation coefficients with red denoting enrichment and blue denoting underrepresentation in a gene pathway. d AVANT-experimental treated group GSEA for the top 20 OS-related HPCs. The immune landscape analysis (N = 355 patients) and GSEA analysis (N = 265 patients) were performed using data available from TCGA. AVANT, Bevacizumab-Avastin® adjuVANT trial. GSEA, gene set enrichment analysis. HPC, histomorphological phenotype cluster. OS, overall survival. Source data are provided as a Source Data file.
Fig. 7
Fig. 7. Clinical application of AI-derived HPCs in prediction of patient outcomes.
The clinical algorithm consists of three key stages: data preparation, cancer patient characterization, and AI-supported multidisciplinary treatment meetings. Data preparation involves collecting histopathology WSIs, segmenting them into small image tiles. Patient characterization encompasses SSL model training, yielding HPCs via clustering. HPCs are easily interpretable by pathologists, linkable to omic data. Most importantly, HPCs are valuable for predicting diagnosis, patient outcomes, and treatment responses. In treatment-related outcomes, AI-predicted high/low risk groups aid multidisciplinary meetings, enabling personalized treatment plans by oncologists, pathologists, and other physicians. AI, artificial intelligence. HPC, histomorphological phenotype cluster. SSL, self-supervised learning, WSI, whole slide image.

Update of

References

    1. Brierley J. D., Gospodarowicz M. K., & Wittekind C. TNM classification of malignant tumours. (John Wiley & Sons, 2017).
    1. Argilés, G. et al. Localised colon cancer: Esmo clinical practice guidelines for diagnosis, treatment and follow-up†. Ann. Oncol.31, 1291–1305 (2020). - PubMed
    1. Martin, R. W. Ajcc 8th edition: colorectal cancer. Ann. surgical Oncol.25, 1454–1455 (2018). - PubMed
    1. Cervantes, A. et al. Metastatic colorectal cancer: Esmo clinical practice guideline for diagnosis, treatment and follow-up. Ann. Oncol.34, 10–32 (2023). - PubMed
    1. Morgan, E. et al. Global burden of colorectal cancer in 2020 and 2040: Incidence and mortality estimates from globocan. Gut72, 338–344 (2023). - PubMed

LinkOut - more resources