Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Mar;56(3):458-472.
doi: 10.1038/s41588-024-01654-5. Epub 2024 Feb 13.

Pathway level subtyping identifies a slow-cycling biological phenotype associated with poor clinical outcomes in colorectal cancer

Affiliations

Pathway level subtyping identifies a slow-cycling biological phenotype associated with poor clinical outcomes in colorectal cancer

Sudhir B Malla et al. Nat Genet. 2024 Mar.

Erratum in

  • Author Correction: Pathway level subtyping identifies a slow-cycling biological phenotype associated with poor clinical outcomes in colorectal cancer.
    Malla SB, Byrne RM, Lafarge MW, Corry SM, Fisher NC, Tsantoulis PK, Mills ML, Ridgway RA, Lannagan TRM, Najumudeen AK, Gilroy KL, Amirkhah R, Maguire SL, Mulholland EJ, Belnoue-Davis HL, Grassi E, Viviani M, Rogan E, Redmond KL, Sakhnevych S, McCooey AJ, Bull C, Hoey E, Sinevici N, Hall H, Ahmaderaghi B, Domingo E, Blake A, Richman SD, Isella C, Miller C, Bertotti A, Trusolino L, Loughrey MB, Kerr EM, Tejpar S; S:CORT consortium; Maughan TS, Lawler M, Campbell AD, Leedham SJ, Koelzer VH, Sansom OJ, Dunne PD. Malla SB, et al. Nat Genet. 2024 Jun;56(6):1321. doi: 10.1038/s41588-024-01809-4. Nat Genet. 2024. PMID: 38831011 Free PMC article. No abstract available.

Abstract

Molecular stratification using gene-level transcriptional data has identified subtypes with distinctive genotypic and phenotypic traits, as exemplified by the consensus molecular subtypes (CMS) in colorectal cancer (CRC). Here, rather than gene-level data, we make use of gene ontology and biological activation state information for initial molecular class discovery. In doing so, we defined three pathway-derived subtypes (PDS) in CRC: PDS1 tumors, which are canonical/LGR5+ stem-rich, highly proliferative and display good prognosis; PDS2 tumors, which are regenerative/ANXA1+ stem-rich, with elevated stromal and immune tumor microenvironmental lineages; and PDS3 tumors, which represent a previously overlooked slow-cycling subset of tumors within CMS2 with reduced stem populations and increased differentiated lineages, particularly enterocytes and enteroendocrine cells, yet display the worst prognosis in locally advanced disease. These PDS3 phenotypic traits are evident across numerous bulk and single-cell datasets, and demark a series of subtle biological states that are currently under-represented in pre-clinical models and are not identified using existing subtyping classifiers.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. PDS of CRC highlights two subsets of CMS2.
a, Schematic of class discovery: gene expression to pathway matrix using gene signature databases (see Methods) on KRASmut CRC samples (n = 165) from the FOCUS cohort (used as the discovery set) defines three PDS following series of dimensionality reduction (t-SNE) and unsupervised k-means clustering. b, Bar chart highlighting the proportion of KRASmut variants across PDS. c, Oncoprint with key cancer driver genes across PDS in KRASmut CRC samples. d, Heatmap depicting the PDS-specific ssGSEA scores across the discovery set (n = 165) with PDS and CMS annotation. e, UMAP from the PDS-specific ssGSEA scores on the discovery set with PDS (left) and CMS (right) annotations, using all samples from the discovery set (top) and PDS1 and PDS3, and CMS2 and CMS3 samples only (bottom). f, Sankey plot focusing on CMS2 and CMS3 primarily subdivided into PDS1 and PDS2. g, Heatmap visualization of the ‘Hallmark’ gene sets ssGSEA scores across the discovery set, with annotated PDS discovery calls and CMS. h, Schematic of classifier development: the FOCUS discovery set was divided into training (n = 125) and test (n = 40) sets, and subsequently, the svmRBF classification algorithm was trained on the training set and tested on the test set to finalize performance of the classification model. The classification prediction threshold was determined and the PDSclassifier R package was developed. i, Ternary plot displaying PDS prediction probabilities using PDSclassifier on the FOCUS test subset (n = 40). The red line denotes the default PDS prediction threshold at 0.6 and the dashed black line represents the PDS prediction threshold at 0.5 and 0.7. j, Sankey plot highlighting the overall concordance of PDS calls between discovery and classifier calls, suggesting robustness of the classifier. UNK, unknown.
Fig. 2
Fig. 2. PDS tumors are genomically indifferent but transcriptionally distinct with prognostic value.
a, PDS classification on all FOCUS samples (n = 360) yields ~12.8% of samples with ‘mixed’ PDS biology that did not reach the threshold of 0.6. b, Heatmap depicting the PDS-specific ssGSEA across the FOCUS cohort, recapturing the same pattern regardless of KRAS mutational status. The top annotation bar indicates PDS, CMS and KRAS mutation. c, Sankey plot highlighting the FOCUS cohorts showing CMS–PDS alignment (top) and how CMS2 and CMS3 are subdivided into PDS1 and PDS3 (bottom). d, UMAP from the PDS-specific ssGSEA scores on the FOCUS cohort with PDS (left) and CMS (right) annotations using all samples from FOCUS cohort (top) and CMS2 and CMS3 and PDS1 and PDS3 samples only (bottom). e, Cancer driver mutations displayed in oncoprint with (red *) BRAF enrichment and low APC mutation in PDS2. f, Copy number alterations in heatmap indicating no significant difference across PDS. Statistics: Fisher’s exact test between PDS1 and PDS3. g, Significant ‘Hallmark’ ssGSEA heatmaps in the FOCUS and GSE39582 cohorts, with top annotation of PDS prediction probabilities and PDS calls. h, Radar plots highlight highly upregulated Hallmark gene sets prominent for each PDS in the FOCUS (top) and GSE39582 (bottom) cohorts. i, Transcriptional factor activity shown in the heatmaps displays transcriptional factors activated or repressed across PDS. j, Kaplan–Meier RFS plots in GSE39582 and PETACC-3 colon cancer cohorts (top). Univariate Cox proportional analysis outcome displayed in a tabular format (bottom), comparing PDS1 with PDS2 and PDS3 with a hazard ratio (HR), 95% confidence intervals (CI) and P value.
Fig. 3
Fig. 3. PDS intratumoral heterogeneity exits with PDS3 displaying no defined histological features.
a, Heatmap visualization of the ssGSEA score using PDS-specific gene in laser capture micro-dissected CRC dataset with annotated epithelium or stroma region per sample. b, Comparison of PDS-specific gene set ssGSEA score between epithelium (n = 8) and stroma (n = 10), using two-sided Wilcoxon rank-sum test. Boxplots depict the median and interquartile range, with whiskers extending to the minimum and maximum values (excluding outliers as dots). c, Workflow schematic on development of the image-based PDS classifier (imPDSclassifier). d, Confusion matrix displaying the PDS prediction based on the imPDSclassifier to the transcriptomic-based PDS calls. e,f, imPDS predictions on the digital whole slide H&E images with tile-level confidence probability on PDS1, PDS2 (e) and PDS3 (f) samples. Scale bars, 2 mm. g, Stromal and immune ESTIMATE score across PDS in the FOCUS (PDS1, n = 93; PDS2, n = 113; PDS3, n = 108) and SPINAL (PDS1, n = 80; PDS2, n = 54; PDS3, n = 82) cohorts. Boxplots inside the violin plots depict the median and interquartile range, with whiskers extending to the minimum and maximum values (excluding outliers as dots). P values: two-sided Wilcoxon rank-sum test. h, Ternary plots represent the utility of PDS prediction probability thresholds, whereby increasing the threshold from 0.6 (left) to stringent 0.8 (right) can lead to more homogeneous PDS tumor samples while i, j, still retaining its prognostic value (PDS1, n = 109; PDS2, n = 83; PDS3, n = 80). Error bars, 95% confidence intervals. k, PDS prediction probability stacked bars and l, density plots highlighting the level of PDS heterogeneity found in each sample, with dashed lines representing the default threshold of 0.6 (top); increasing threshold, for example, to 0.8 (bottom), makes it more transcriptionally homogeneous.
Fig. 4
Fig. 4. PDS3 tumors indicate no stem-like and precursor lesion associations.
a, Scatter gradient-density plot representing LGR5+ CBC and LGR5/ANXA1+ RSC ssGSEA scores with PDS sample annotations in the SPINAL cohort (top). Stem cell plasticity landscape schematic highlighting the PDS1 association with CBC and the PDS2 association with RSC gene signatures along the stem cell index proposed in a previous study (bottom). b, In situ hybridization multiplex staining with LGR5, ANXA1 and DAPI across PDS samples in the SPINAL (PDS1, n = 7; PDS2, n = 7; PDS3, n = 6) cohort. Scale bars, 1 mm and 100 μm. c, Schematic representing PDS classification of the colorectal polyp dataset with categorized polyp types (left); confusion matrix depicting the association between PDS calls and polyps (right). d, Proliferation index and replication stress measure across PDS in the FOCUS (PDS1, n = 93; PDS2, n = 113; PDS3, n = 108) and GSE39582 (PDS1, n = 186; PDS2, n = 140; PDS3, n = 122) cohorts. e, Representative immunohistochemistry images of Ki67+ staining. Scale bars, 50 μm. f, Violin plot showing the tumor area (mm2) determined from H&E digital histological scores using HALO (PDS1, n = 91; PDS2, n = 111; PDS3, n = 107) and the per cent of tumor Ki67+ per sample (represented in e) across PDS in the FOCUS (PDS1, n = 91; PDS2, n = 108; PDS3, n = 105) cohort. Boxplots inside the violin plots (in d and f) depict the interquartile range, median, minimum and maximum values (excluding outliers as dots). P values: two-sided Wilcoxon rank-sum test. g, ShinyApp platform (SubtypeExploreR), integrated with the FOCUS and GSE39582 cohorts allows interrogation into gene expressions and signatures across three different CRC subtypes, including PDS, CMS and iCMS. TVA, tubulovillous adenomas; SSL, sessile serrated lesions; TSA, traditional serrated adenomas.
Fig. 5
Fig. 5. MYC–PRC targets biological axis associated with stem–differentiation cellular dynamics.
a, GSEA enrichment and violin plot displaying ssGSEA scores compared across PDS in the FOCUS cohort, using PRC targets gene signature. b, as in a, but with MYC targets gene signature (PDS1, n = 93; PDS2, n = 113; PDS3, n = 108). Boxplots inside the violin plots depict the interquartile range, median, minimum and maximum value (excluding outliers as dots). P values (violin plots): two-sided Wilcoxon rank-sum test. GSEA: Benjamini–Hochberg adjusted P value. NES, normalized enrichment score. c, Scatterplot depicting the inverse correlation between MYC targets and PRC targets ssGSEA scores, with annotated PDS samples. P value: two-sided Pearson correlation coefficient. d, Schematic of mouse intestinal crypt epithelium transcriptional dataset (GSE143915 (ref. )). e, PRC targets and Myc targets ssGSEA scores across colon epithelial cells. f, as in c, but with colon epithelial cell types. g, Schematic of murine organoid-derived epithelial single-cell dataset used for RNA velocity and pseudotime trajectory analysis. h, PHATE visualizations with cells annotated as MYC targets-high and PRC targets-high (top) and epithelial cell types (bottom). Arrows: RNA velocity based on CellRank method. i, Pseudotime score comparisons across groups (top) and cell types (bottom). The red and green dash lines represent the median of MYC and PRC targets ssGSEA scores, respectively. j, Schematic of SMI, made available in the PDSclassifier R package for both bulk and single-cell data. k, PHATE visualization with SMI. l, Comparison of SMI across epithelial cell types in GSE143915 and PDS in the FOCUS cohort (n as in b). P values: two-sided Wilcoxon rank-sum test. m, Heatmap visualization of the differentiation, PRC targets and MYC targets gene expressions in the FOCUS cohort, with annotated PDS prediction probabilities, PDS and SMI. n, In addition to the stem phenotypic landscape described in a previous work, an overall tumor epithelial landscape describes the stem–differentiation cellular dynamics along the SMI, in which PDS3 tumors are highly enriched for differentiated-like traits that correspond to high PRC target expression. AbsPro, absorptive progenitor; SecPDG, secretory progenitor/deep crypt secretory cells/goblet, Ent, enterocytes; EEC, enteroendocrine cell.
Fig. 6
Fig. 6. PDS3 closely resembles WT normal differentiation patterns and lacks pre-clinical models.
a, Six genotypic GEMM primary tumors were sequenced and PDS were called. b, Heatmap displaying the ssGSEA scores for the Hallmark gene sets in GEMMs across PDS, including ‘mixed’ samples. Top annotation bar indicates PDS calls, mouse CMS (MmCMS) and genotypes. c, Heatmap displays TF activity from DoRothEA. d, Violin plots showing proliferative index (top) and replication stress (bottom) across PDS in GEMMs (PDS1, n = 11; PDS2, n = 15; PDS3, n = 12). Boxes inside the violin plots depict the interquartile range, median, minimum and maximum values. P values: two-sided Wilcoxon rank-sum test. e, Scatterplot highlighting the inverse correlation between Myc targets and PRC targets ssGSEA scores recapitulated in GEMMs, with PDS annotation. P value: two-sided Pearson correlation coefficient. f, PHATE visualization on murine organoid scRNA-seq data with cell density and cell type annotations for AK (left) and WT (right). g, Waddington landscape representing AK and WT models annotated with cell types (top) or MYC–PRC target-high groups (bottom), closely resembling PDS1 stem-like vs PDS3 differentiated-like traits, respectively. h, PHATE visualization for WT cultured in WENR (W, WNT3A; E, EGF; N, Noggin; R, R-spondin-1) in which cellular states are skewed towards stem cell enrichment with limited differentiation. i, Waddington landscape of WT + WENR annotated with cell types (top) and MYC–PRC target-high groups (bottom), closely resembling PDS1 stem enrichment. j, Representative Ki67+ ISH images of murine organoids (AK, AK + ENR, WT, WT + ENR; n = 2 replicates each per group). Scale bars: AK, 50 μm; AK + ENR, 100 μm; WT, 50 μm; WT + ENR, 50 μm. k, PHATE visualization for proliferation index scores across all populations.
Fig. 7
Fig. 7. Single-cell analysis of CRC confirms MYC–PRC targets axis.
a, Epithelial scRNA-seq dataset from 63 patients with CRC over five cohorts; UMAP displaying cell clustering based on patients. b, UMAP visualization indicating cells annotated with MYC targets ssGSEA score (left), PRC targets ssGSEA score (center) and a blend of the two (right). c, Scatterplot highlighting an inverse correlation between MYC targets and PRC targets at the single-cell level (color key as in b). d, UMAP visualization using PDS gene set scores with cells annotated with MYC targets (left), PRC targets (center) and a blend of the two (right). e, UMAP visualization annotated with cell cycle phases per cell (left) and stacked bar chart display of the proportion and number of cells at different G1, S or G2M cell cycle phases. f, UMAP visualization annotated with cells indicating proliferative index, replicative stress and KRAS signal DN. g, Boxplots displaying a comparison between MYC targets-high (n = 9,221) and PRC targets-high (n = 8,964) cells for proliferative index, replicative stress and KRAS signal DN. Boxplots depict the interquartile range, median, minimum and maximum values (excluding outliers as dots). P values: two-sided Wilcoxon rank-sum test. h, UMAP visualization annotated with cells indicating CBC and RSC stem gene signatures. i, Boxplots as in g for CBC and RSC.
Fig. 8
Fig. 8. PDS3 differentiation-like trait relates to tumor biology unexplained by single-cell-derived iCMS.
a, RNA velocity and pseudotime analysis using the CellRank method, applied to the CRC epithelial scRNA-seq data. b, UMAP visualization displays patient-based clusters (left) and scatterplot between MYC targets and PRC targets ssGSEA scores (right) with cells annotated with pseudotime. P value: two-sided Pearson correlation coefficient. c, UMAP visualization of PDS gene set scores with MYC and PRC targets-high cell annotations and arrows indicating RNA velocity using the CellRank method. d, UMAP visualizations (top) of data from two patients with CRC, with MYC and PRC targets-high cell annotations and RNA velocity displayed with arrows. Violin plots (bottom) display pseudotime analysis for the respective CRC patients. e, UMAP visualizations (top) of data from all patients, annotated with MYC and PRC targets-high (left) or iCMS calls (right), and violin plots (bottom) displaying pseudotime analysis for the MYC–PRC groups and iCMS. f, Violin plots display differentiation potency using correlation of connectome and transcriptome (CCAT) comparing MYC–PRC targets-high (left) and iCMS (right). g, Violin plots for CCAT measure per iCMS2 (left) or iCMS3 (right) patients. h, Scatterplot displaying the association between CCAT and SMI. i, UMAP visualization on PDS gene set scores, cell annotation with iCMS2 (top-left) and image-based CMS3 (top-right) ssGSEA scores, a blend of the two (bottom-right) and SMI (bottom-left). j, Summarized schematic describing the CRC bulk tumor classified into PDS, outlining the biological and clinical features of PDS tumors, in which MYC targets in association with PDS1 and PRC targets in association with PDS3 led to the identification of an SMI that enumerates stem–differentiation shifts in cellular state.
Extended Data Fig. 1
Extended Data Fig. 1. Classification model comparisons and development of PDS classification.
a, Elbow or within cluster sums of squares method on k = 1:10 indicated k = 3 (shown with vertical black dashed line) as optimal number of clusters. b, Silhouette width method on k = 1:10 further confirms k = 3 as the optimal number of clusters. c, t-SNE plot on ssGSEA gene set score matrix of the discovery set (n = 165) with PDS annotated. d, Sankey plots show PDS (discovery calls) and CMS per sample across the KRAS mutants (KRASmuts) discovery set with (top) or without (bottom) CMS ‘unknown’. e, Sensitivity, specificity, balanced accuracy, and the prediction probability per sample on the test set (n = 40) for three different classification models: support vector machine via radial basis function (svmRBF; top), nearest shrunken centroid (NSC; centre), and lasso and elastic-net regularised generalised linear model (glmnet; bottom). f, Comparison of the overall accuracy between the three classification models. g, Bar chart represents PDS prediction probabilities using svmRBF model on the test data with three different threshold levels at 0.5 (left), 0.6 (centre) and 0.7 (right). The dashed white line denotes the threshold where samples below the threshold are labelled as ‘Mixed’ samples, annotated at the top. h, Sensitivity, specificity, and balanced accuracy following application of the 0.6 threshold to the PDS prediction. i, Histogram highlights permutation test for the PDS classification on the test data.
Extended Data Fig. 2
Extended Data Fig. 2. Clinicopathological and genomic associations of PDS.
a, Proportions of samples examined for clinical/molecular features including sidedness, stage, MSI, and CIMP (left), and boxplot displaying age at diagnosis (right) across PDS in FOCUS (boxplot: PDS1, n = 93; PDS2, n = 113; PDS3, n = 108) and GSE39582 (boxplot: PDS1, n = 186; PDS2, n = 139; PDS3, n = 122) cohorts. P-values (boxplot): Wilcoxon rank-sum test. b, Oncoprint with key driver mutation genes from WNT, MAPK, PIK3CA, cell cycle and TGF-β pathways in the SPINAL cohort. c, Copy number gain/loss per gene from the SPINAL cohort visualised as heatmap. Two-sided Fisher’s exact test between PDS1 and PDS3 followed by post hoc analysis, Benjamini-Hochberg adjusted P-value annotated at the left-side bar of the heatmap; asterisk denotes Benjamini-Hochberg adjusted P-value < 0.05 (TP53, P-value = 0.0081). d, Copy number variation by chromosome arms in the FOCUS (left) and SPINAL (right), ordered based on PDS as rows. Statistics as in c (19p, P-value = 0.00204). e, Kaplan-Meier plots of the relapse-free survival (RFS) of colon cancer patients per CMS in the GSE39582 cohort. f, RFS Kaplan-Meier plots of colon cancer patients classified by PDS in the GSE39582 (top) and PETACC-3 (bottom) cohorts in KRAS mutants (KRASmut), KRAS wild-types (KRASwt) and microsatellite stable (MSS) only. g, RFS Kaplan-Meier plots on the PDS2/CMS1 vs PDS2/CMS4 and CMS2/PDS1 vs CMS2/PDS3 subsets in the GSE39582 cohort. h, Heatmap represents PDS ‘Hallmarks’ ssGSEA scores with PDS classifications across three TCGA cancer type cohorts: COREAD (left), LUAD (centre) and PAAD (right).
Extended Data Fig. 3
Extended Data Fig. 3. Transcriptional landscape of PDS validated across independent CRC cohorts.
a, Heatmaps represent the ‘Hallmark’ gene sets ssGSEA scores across the FOCUS (left), GSE39582 (centre) and SPINAL (right) cohorts, where the samples are ordered based on PDS with annotated PDS prediction probabilities, PDS, iCMS, CMS, CRIS, and mutational status of KRAS, BRAF and TP53. Upregulated PDS-specific Hallmarks for each independent cohorts are shown in coloured text on the side. b, Violin plots display ESTIMATE stromal and immune scores examined across PDS in the GSE39582 (PDS1, n = 186; PDS2, n = 140; PDS3, n = 122) cohort. Boxes within violin plots depict the interquartile range, median, minimum, and maximum value (excluding outliers as dots). P-values: two-sided Wilcoxon rank-sum test. c, as in b, examining ESTIMATE tumour purity measure across PDS in the FOCUS (PDS1, n = 93; PDS2, n = 113; PDS3, n = 108), GSE39582 (numbers as in b), and SPINAL (PDS1, n = 80; PDS2, n = 54; PDS3, n = 82) cohorts.
Extended Data Fig. 4
Extended Data Fig. 4. Stem and polyp gene signatures interrogation in PDS.
a, Scatter gradient-density plot represents LGR5+ crypt base columnar (CBC) and LGR5/ANXA1+ regenerative stem cells (RSC) ssGSEA scores with annotated PDS sample in the FOCUS cohort. P-value: two-sided Pearson correlation co-efficient. b, Heatmaps depict ssGSEA scores per sample for the stem and polyp signatures in the FOCUS and GSE39582 cohorts with annotated PDS calls and the Intestinal Stem Cell Index (ISC-index). c, Violin plots compare ssGSEA scores across PDS for CBC, RSC stem signatures and tubular, serrated precursor polyp signatures in the FOCUS (PDS1, n = 93; PDS2, n = 113; PDS3, n = 108), and d, GSE39582 (PDS1, n = 186; PDS2, n = 140; PDS3, n = 122) cohort. Boxes within violin plots depict the interquartile range, median, minimum, and maximum value (excluding outliers as dots). P-values: two-sided Wilcoxon rank-sum test.
Extended Data Fig. 5
Extended Data Fig. 5. PDS3 association to PRC Targets led to resemblance to differentiation-like traits.
a, Piechart denotes PDS-specific genes from PDS gene sets, where the PDS3 genes were analysed using Enrichr against the ChEA 2016 database resulting in significant association with SUZ12, REST and EZH2 (top), forming protein-protein interaction STRING network (bottom). Enrichr: Benjamini-Hochberg adjusted P-value. b, The genes/proteins show link to polycomb repressive protein complex (PRC) with the core protein involving SUZ12 and EZH2. A simplified schematic describes the role of PRC in marking with trimethylation (Me3) at lysine 27 in histone 3 (H3K27) that leads to repression of the PRC target genes while the PRC target genes are expressed upon the absence of this marker from PRC. c, Gene expression measure of EZH2, SUZ12, and REST across PDS in the FOCUS (PDS1, n = 93; PDS2, n = 113; PDS3, n = 108), and d, across mouse epithelium cell populations in the GSE143915 cohort. Boxes depict the interquartile range, median, minimum, and maximum value (excluding outliers as dots). P-values (in c): two-sided Wilcoxon rank-sum test. e, GSEA enrichment and violin plot displaying ssGSEA score across PDS for PRC targets, and f, MYC targets in the SPINAL (PDS1, n = 80; PDS2, n = 54; PDS3, n = 82) cohort. P-values (violin plots): two-sided Wilcoxon rank-sum test. GSEA: Benjamini-Hochberg adjusted P-value, NES = Normalised Enrichment Score. g, Correlation between MYC targets and PRC targets with annotated PDS calls in the SPINAL cohort. h, Boxplot highlights measure of proliferation index and replication stress across mouse epithelial cell populations (n= as in d). i, Violin plot represents SMI across PDS in the SPINAL cohort (n= and statistics as in f). j, PHATE visualisation of murine organoid-derived scRNA-seq with MYC targets High and PRC targets High cell annotations (left), and violin plot with SMI across annotated epithelial cell types (right). k, Heatmap displays the gene expression of differentiation markers, PRC targets and MYC targets signatures across PDS in the SPINAL cohort, annotated with the PDS prediction probabilities, PDS calls, and SMI. l, Heatmap represents gene expressions for the differentiation makers. AbsPro = absorptive progenitor, SecPDG = secretory progenitor/deep crypt secretory cells/goblet, Ent = enterocytes, EEC = enteroendocrine cell, CSC = colonic stem cell, proCSC = hyper-proliferative CSC, revCSC = revival CSC, TA = transit amplifying cell, DSC = deep crypt secretory cell.
Extended Data Fig. 6
Extended Data Fig. 6. Caveat of differentiated-like PDS3 in preclinical setting.
a, Heatmap representation of gene expressions for the differentiation makers in the GSE143915 cohort, where the top annotation indicates mouse epithelial cell populations. b, Heatmap depicts gene expression for the differentiation markers in the murine organoid-derived scRNA-seq data, where the top annotations display MYC targets High and PRC targets High, epithelial cell types, and SMI. c, PHATE visualisations of Lgr5 (left) and Anxa1 (right) expression in the mouse organoid-derived scRNA-seq dataset. d, PCA plots on genetically engineered mouse models-derived primary tumour dataset annotated with genotype (left) and PDS (right). e, Boxplots compares proliferative index (left) and replication stress (right) across the genotypic models with PDS annotations. f, PHATE visualisation on AK (left), WT (centre) and WT + WENR (right) models of murine-derived scRNA-seq data, annotated with SMI and single cell density. g, Right: Representative BRDU+ images of murine organoids cells, cultured with or without EGF, Noggin and R-spondin-1 growth media supplements (AK, AK + ENR, WT + ENR, n = 2 replicates each per group; WT, n = 1). Scale bars: AK, 50μm; AK + ENR, 100μm; WT, 50μm; WT + ENR, 50μm. Left: Barchart displays the positivity (%) of the BRDU+ and Ki67+ staining, enumerated via QuPath. Data are mean±s.d. h, PHATE visualisation annotated with Proliferation Index and single cell density. WENR or ENR: W = WNT3A, E = EGF, N = Noggin, R = R-spondin-1.
Extended Data Fig. 7
Extended Data Fig. 7. MYC targets and PRC targets examined at the single-cell level.
a, Proportion of cells with tertile-based stratification using MYC targets and PRC targets single-cell ssGSEA scores into High, Mid and Low. b, Scatterplot shows the correlation between MYC targets and PRC targets. The MYC targets-High and PRC targets-High (top tertiles) highlighted in red and green respectively. P-value: two-sided Pearson correlation co-efficient. c, UMAP with MYC target-High and PRC target-High cells only. d, UMAP with cell cycle phase annotated per cells. e, UMAP displays proliferative index, replication stress, KRAS SIGNAL DN and neuroactive ligand receptor interaction. f, UMAP visualisation with CBC and RSC ssGSEA scores.
Extended Data Fig. 8
Extended Data Fig. 8. MYC/PRC highlights nuanced biology previously unidentified and mutually exclusive from iCMS biology.
a, UMAP visualisation of epithelial scRNA-seq dataset from n = 63 CRC patients. b, UMAP visualisation with SMI annotations. c, Boxplot displaying all the patients assessing differentiation potency per cell with MYC targets-High/PRC targets-High. d, Scatterplot shows positive correlation between SMI and pseudotime score. P-value: two-sided Pearson correlation co-efficient. e, PHATE visualisation on MYC targets, PRC targets ssGSEA scores and SMI in contrast to iCMS2, iCMS3 ssGSEA scores and iCMS2-High/iCMS3-High (based on tertile split of ssGSEA scores). The vertical dashed black line roughly indicates the MYC/PRC high/low cells defining distinctive differences compared to that of iCMS2/iCMS3.

References

    1. Budinska E, et al. Gene expression patterns unveil a new level of molecular heterogeneity in colorectal cancer. J. Pathol. 2013;231:63–76. doi: 10.1002/path.4212. - DOI - PMC - PubMed
    1. Marisa L, et al. Gene expression classification of colon cancer into molecular subtypes: characterization, validation, and prognostic value. PLoS Med. 2013;10:e1001453. doi: 10.1371/journal.pmed.1001453. - DOI - PMC - PubMed
    1. Isella C, et al. Selective analysis of cancer-cell intrinsic transcriptional traits defines novel clinically relevant subtypes of colorectal cancer. Nat. Commun. 2017;8:15107. doi: 10.1038/ncomms15107. - DOI - PMC - PubMed
    1. Liberzon A, et al. The Molecular Signatures Database (MSigDB) hallmark gene set collection. Cell Syst. 2015;1:417–425. doi: 10.1016/j.cels.2015.12.004. - DOI - PMC - PubMed
    1. Guinney J, et al. The consensus molecular subtypes of colorectal cancer. Nat. Med. 2015;21:1350–1356. doi: 10.1038/nm.3967. - DOI - PMC - PubMed

Substances