Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Multicenter Study
. 2025 Feb;265(2):184-197.
doi: 10.1002/path.6376. Epub 2024 Dec 22.

Tumour purity assessment with deep learning in colorectal cancer and impact on molecular analysis

Collaborators, Affiliations
Multicenter Study

Tumour purity assessment with deep learning in colorectal cancer and impact on molecular analysis

Lydia A Schoenpflug et al. J Pathol. 2025 Feb.

Abstract

Tumour content plays a pivotal role in directing the bioinformatic analysis of molecular profiles such as copy number variation (CNV). In clinical application, tumour purity estimation (TPE) is achieved either through visual pathological review [conventional pathology (CP)] or the deconvolution of molecular data. While CP provides a direct measurement, it demonstrates modest reproducibility and lacks standardisation. Conversely, deconvolution methods offer an indirect assessment with uncertain accuracy, underscoring the necessity for innovative approaches. SoftCTM is an open-source, multiorgan deep-learning (DL) model for the detection of tumour and non-tumour cells in H&E-stained slides, developed within the Overlapped Cell on Tissue Dataset for Histopathology (OCELOT) Challenge 2023. Here, using three large multicentre colorectal cancer (CRC) cohorts (N = 1,097 patients) with digital pathology and multi-omic data, we compare the utility and accuracy of TPE with SoftCTM versus CP and bioinformatic deconvolution methods (RNA expression, DNA methylation) for downstream molecular analysis, including CNV profiling. SoftCTM showed technical repeatability when applied twice on the same slide (r = 1.0) and excellent correlations in paired H&E slides (r > 0.9). TPEs profiled by SoftCTM correlated highly with RNA expression (r = 0.59) and DNA methylation (r = 0.40), while TPEs by CP showed a lower correlation with RNA expression (r = 0.41) and DNA methylation (r = 0.29). We show that CP and deconvolution methods respectively underestimate and overestimate tumour content compared to SoftCTM, resulting in 6-13% differing CNV calls. In summary, TPE with SoftCTM enables reproducibility, automation, and standardisation at single-cell resolution. SoftCTM estimates (M = 58.9%, SD ±16.3%) reconcile the overestimation by molecular data extrapolation (RNA expression: M = 79.2%, SD ±10.5, DNA methylation: M = 62.7%, SD ±11.8%) and underestimation by CP (M = 35.9%, SD ±13.1%), providing a more reliable middle ground. A fully integrated computational pathology solution could therefore be used to improve downstream molecular analyses for research and clinics. © 2024 The Author(s). The Journal of Pathology published by John Wiley & Sons Ltd on behalf of The Pathological Society of Great Britain and Ireland.

Keywords: artificial intelligence; colorectal cancer; diagnostic molecular pathology; pathology; personalised medicine.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Experimental study design. (A) Specifications and data summary of the three independent datasets (FOCUS, TCGA, and GRAMPIAN) used in this study. (B) Available data types and tumour estimation methods applied on each data type. (C) Sample collection and profiling strategy in FOCUS and GRAMPIAN cohorts. Created with canva.com.
Figure 2
Figure 2
Inference workflow for tumour and background cell nuclei (TC, BC) detection by SoftCTM on a H&E‐stained WSI. (A) WSI‐level inference: Pathologist‐marked ROIs of a H&E‐stained WSI are tiled into patches (1,024 × 1,024 pixels at 0.5 MPP), on which SoftCTM is applied. The predictions are then recombined into a spatially resolved WSI‐level prediction of detected TC and BC. (B) Patch‐level inference: The SoftCTM algorithm consists of two stages: tissue segmentation and cell detection. Tissue segmentation is performed at 0.8 MPP, cell detection at 0.5 MPP. For cell detection, the tissue segmentation prediction is used as input along with the input patch. The output is a probability map for each cell class, from which detected cells are extracted through a postprocessing step. MPP, microns per pixel; WSI, whole slide image
Figure 3
Figure 3
Boxplot and histogram comparing distribution of TP estimated by different methods for the combined test cohorts. We only consider samples with TPE available for all methods. The box represents the IQR, encompassing 50% of the data points. Red line indicates median, and whiskers extend to ±1.5 IQR from IQR edges. TP, tumour purity; IQR, interquartile range
Figure 4
Figure 4
Comparison of TPE method results for test cohorts. Below diagonal: scatter plots comparing respective TPE method results. Diagonal: histogram for each TPE method with mean (M) and standard deviation (SD) in top right. Above diagonal: Pearson correlation coefficient between respective TPE methods; ****p < 0.0001. CI, confidence interval; TPE, tumour purity estimation.
Figure 5
Figure 5
Comparison of Whole Genome Instability Index (WGII) adjusted by (A) SoftCTM, (B) ESTIMATE, (C) InfiniumPurify, and (D) CP; all p < 0.001. CI, confidence interval.

References

    1. Fisher NC, Byrne RM, Leslie H, et al. Biological misinterpretation of transcriptional signatures in tumor samples can unknowingly undermine mechanistic understanding and faithful alignment with preclinical data. Clin Cancer Res 2022; 28: 4056–4069. - PMC - PubMed
    1. Kim J, Park WY, Kim NKD, et al. Good laboratory standards for clinical next‐generation sequencing cancer panel tests. J Pathol Transl Med 2017; 51: 191–204. - PMC - PubMed
    1. Hamilton PW, Wang Y, Boyd C, et al. Automated tumor analysis for molecular profiling in lung cancer. Oncotarget 2015; 6: 27938–27952. - PMC - PubMed
    1. Haider S, Tyekucheva S, Prandi D, et al. Systematic assessment of tumor purity and its clinical implications. JCO Precis Oncol 2020; 4: 995–1005. - PMC - PubMed
    1. Chakravarthy A, Furness A, Joshi K, et al. Pan‐cancer deconvolution of tumour composition using DNA methylation. Nat Commun 2018; 9: 3220. - PMC - PubMed

Publication types

Substances