Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Nov 11;15(1):9139.
doi: 10.1038/s41467-024-53163-y.

Integrative ensemble modelling of cetuximab sensitivity in colorectal cancer patient-derived xenografts

Affiliations

Integrative ensemble modelling of cetuximab sensitivity in colorectal cancer patient-derived xenografts

Umberto Perron et al. Nat Commun. .

Abstract

Patient-derived xenografts (PDXs) are tumour fragments engrafted into mice for preclinical studies. PDXs offer clear advantages over simpler in vitro cancer models - such as cancer cell lines (CCLs) and organoids - in terms of structural complexity, heterogeneity, and stromal interactions. Here, we characterise 231 colorectal cancer PDXs at the genomic, transcriptomic, and epigenetic levels, along with their response to cetuximab, an EGFR inhibitor used clinically for metastatic colorectal cancer. After evaluating the PDXs' quality, stability, and molecular concordance with publicly available patient cohorts, we present results from training, interpreting, and validating the integrative ensemble classifier CeSta. This model takes in input the PDXs' multi-omic characterisation and predicts their sensitivity to cetuximab treatment, achieving an area under the receiver operating characteristics curve > 0.88. Our study demonstrates that large PDX collections can be leveraged to train accurate, interpretable drug sensitivity models that: (1) better capture patient-derived therapeutic biomarkers compared to models trained on CCL data, (2) can be robustly validated across independent PDX cohorts, and (3) could contribute to the development of future therapeutic biomarkers.

PubMed Disclaimer

Conflict of interest statement

Competing interests FI receives funding from Open Targets, a public-private initiative involving academia and industry, and from Nerviano Medical Sciences and performs consultancy for the joint Cancer Research Horizon— AstraZeneca Functional Genomics Centre and for Mosaic T.X., L.T. has received research grants from Menarini, Merck KGaA, Merus, Pfizer, Servier and Symphogen. U.P. is a consultant for Omniscope Inc. H.K. and J.S. are employee of Charles River. U.M. is an employee and holder of company stock of AstraZeneca. All the other authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Multi-omic Overview of the Colorectal Cancer PDX Cohort and Cetuximab Response Modelling Approach.
a The left panel presents the IRCC patient derive xenografs (PDX) collection, from 231 unique colorectal cancer (CRC) liver metastasis (LMX) resections. This collection was characterised at a multi-omic level and assessed for cetuximab response. A schematic of the omic-specific feature engineering is also provided. The right panel outlines the CeSta classifier pipeline. Input features selected from the training set (Methods) using univariate tests (Fisher’s exact, Mann-Withney U-test) and multivariate linear models feed into three independent level 1 classifier pipelines: forward feature selection plus elastic net, ANOVA feature selection plus extra trees, and ANOVA feature selection plus support vector classifiers. A fourth classifier, a catBoost model, is pre-trained on pan-cancer data from the Cell Model Passport repository and fine-tuned using IRCC-PDX data. The predictions from these level 1 classifiers are stacked and inputted into a meta-classifier, which produces the final binary classification (cetuximab-responder/non-responder) using argmax-based soft voting. b CeSta nested cross-validation approach: 50 train/test splits are generated via stratified sampling of the IRCC-PDX collection. CeSta is trained and tuned independently across these 50 splits. In each iteration, the training set is divided into three folds. Two folds are used in three rounds as the ‘training fold’, while the remaining fold serves as the ‘validation fold’. Predictions from level-1 classifiers for the validation fold are stacked and input into the meta-classifier. After validation, first-level classifiers are fitted to the entire training set, and CeSta’s performance is evaluated on the test set (pink rectangle, N = 81). CeSta is then trained on the entire IRCC-PDx dataset and tested on an independent CR-PDX dataset (grey rectangle, N = 50) for external validation. c Top frequently mutated genes in the IRCC-PDX cohort. d Selection of multi-omic and clinical features across the IRCC-PDX collection, including CRIS expression cluster labels, methylation NMF cluster labels, primary sample anatomical location, and treatment backbone. Source data are provided as a Source Data file. Fig. 1AB has been Created in BioRender [Iorio, F. (2024) BioRender.com/q01w468] and released under a Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International license (https://creativecommons.org/licenses/by-nc-nd/4.0/deed.en).
Fig. 2
Fig. 2. Overview of cetuximab response and biomarker candidates.
a Mutation patterns of CRC driver genes and mutational signature features among those with the most significant impact on CeSta predictions (Fig. 4a) b cetuximab non-responders (‘PD’, volume growth > 35%, in orange) and responders (‘SD-OR’, volume growth ≤35%, in blue). c Selection of continuous features which best differentiate between PD and SD-OR PDX models. Source data are provided as a Source Data file.
Fig. 3
Fig. 3. CeSta outperforms the state-of-the-art baseline classifier on IRCC-PDX and CR-PDX.
a Classification performances quantified through F1 scores (harmonic mean of precision and recall) across 50 train/test IRCC-PDX split replicates (x-axis) for the stacked classifier (‘CeSta’, in blue), an elastic net penalised logistic model (‘elNet baseline’, in tan) which uses state-of-the-art clinical features for cetuximab sensitivity in CRC (KRAS, NRAS, BRAF mutational status, right colon tumour location), a rule-based classifier using the KRAS-BRAF-NRAS triple negative clinical signature (tripleNegRule, in orange) as a binary predictor, and another rule-based classifier which uses both the aforementioned triple-negative signature and the ‘right colon’ feature (tripleNegRightRule, in green). b Area under the receiver-operating-characteristic curve (AUROC) values and error bars, obtained via DeLong’s method, indicating 95% confidence intervals, across 50 IRCC-PDX of n = 150 and 81 train/test split replicates replicates (x-axis), for CeSta (in blue) and the elastic net penalised logistic model (‘elNet baseline’, in tan) described in (a). c AUROC (DeLong’s method) computed over the external validation CR-PDX dataset for CeSta (in blue) and the elNet baseline classifier (‘elNet baseline’, in tan) after a single instance of both models is trained and tuned over the entire IRCC-PDX dataset. The shaded area between the CeSta and elNet baseline ROC curves represents the improvement in AUROC. Decision point coordinates correspond to the false-positive and true positive rates obtained from the corresponding classifier’s predictions. Here, rule-based classifier decision points overlap with the elNet baseline’s. d Confusion matrix from a comparison of CeSta classifier outcomes (same validation setup as c) and PDXs actual cetuximab response over the external validation CR-PDX dataset. Correct predictions are on the diagonal highlighted in blue, incorrect predictions off the diagonal are highlighted in purple. e CeSta correct prediction counts (same validation setup as c) over the CR-PDX external validation set grouped by PDX cetuximab sensitivity (x-axis) and PDX KRAS-NRAS-BRAF triple-negative status (y-axis). CeSta correctly predicts additional triple-negative non-responders (3) and triple-positive responders (1), which all baseline classifiers miss. Source data are provided as a Source Data file.
Fig. 4
Fig. 4. CeSta leverages informative features and combines weaker classifiers.
a Feature importance as determined by CeSta, represented by the mean absolute SHAP value (x-axis) for the top significant features (y-axis). b Top significant features’ impact on CeSta output using SHAP values (x-axis) across all 50 PDXs in the CR-PDX validation set (scatter dots). The most important features in (a) have the greatest impact on model outcomes, with a clear separation between positive and negative effects. c Performance of CeSta’s top features on IRCC PDXs and the external cohort. The relationship between a feature’s SHAP values and cetuximab sensitivity on the train set (full IRCC PDX set, x-axis) and test set (CR PDX set), after removing other features’ effects (partial correlation, parSHAP). Dot size and colour indicate a feature’s mean absolute SHAP value on the training set. Dots closer to the diagonal indicate consistent performance across train and test sets. Key features like KRAS mutation and EREG expression align closely with the diagonal, indicating a good fit or slight underfitting. d Underperformance of CMP-trained features on the external cohort. The relationship between CatBoostCMP feature SHAP values and cetuximab sensitivity on the train (panCMP set) and test (CR-PDX) sets, after removing other features’ effects. Dot size and colour represent a feature’s impact on model prediction. Many top features of this model fall in the lower right quadrant, indicating overfitting. e AUROC confidence intervals (CI, 95%) for CeSta (blue), three level 1 classifiers (orange), the catBoost model trained on the panCMP dataset (green), and the same catBoost model retrained on the IRCC-PDX dataset. CeSta shows a slight performance improvement over the best level 1 classifier, with overlapping CIs. The cell-line-trained CatBoost classifier poorly predicts cetuximab sensitivity in PDXs, but retraining improves its performance. Source data are provided as a Source Data file.

References

    1. Biller, L. H. & Schrag, D. Diagnosis and treatment of metastatic colorectal cancer: a review. JAMA325, 669–685 (2021). - PubMed
    1. Bertotti, A. et al. A molecularly annotated platform of patient-derived xenografts (‘xenopatients’) identifies HER2 as an effective therapeutic target in cetuximab-resistant colorectal cancer. Cancer Discov.1, 508–523 (2011). - PubMed
    1. Burgenske, D. M. et al. Establishment of genetically diverse patient-derived xenografts of colorectal cancer. Am. J. Cancer Res.4, 824–837 (2014). - PMC - PubMed
    1. Isella, C. et al. Selective analysis of cancer-cell intrinsic transcriptional traits defines novel clinically relevant subtypes of colorectal cancer. Nat. Commun.8, 15107 (2017). - PMC - PubMed
    1. Stintzing, S. et al. FOLFIRI plus cetuximab versus FOLFIRI plus bevacizumab for metastatic colorectal cancer (FIRE-3): a post-hoc analysis of tumour dynamics in the final RAS wild-type subgroup of this randomised open-label phase 3 trial. Lancet Oncol.17, 1426–1434 (2016). - PubMed

Publication types

MeSH terms