Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Sep;5(9):1305-1317.
doi: 10.1038/s43018-024-00793-2. Epub 2024 Jul 3.

A deep-learning framework to predict cancer treatment response from histopathology images through imputed transcriptomics

Affiliations

A deep-learning framework to predict cancer treatment response from histopathology images through imputed transcriptomics

Danh-Tai Hoang et al. Nat Cancer. 2024 Sep.

Abstract

Advances in artificial intelligence have paved the way for leveraging hematoxylin and eosin-stained tumor slides for precision oncology. We present ENLIGHT-DeepPT, an indirect two-step approach consisting of (1) DeepPT, a deep-learning framework that predicts genome-wide tumor mRNA expression from slides, and (2) ENLIGHT, which predicts response to targeted and immune therapies from the inferred expression values. We show that DeepPT successfully predicts transcriptomics in all 16 The Cancer Genome Atlas cohorts tested and generalizes well to two independent datasets. ENLIGHT-DeepPT successfully predicts true responders in five independent patient cohorts involving four different treatments spanning six cancer types, with an overall odds ratio of 2.28 and a 39.5% increased response rate among predicted responders versus the baseline rate. Notably, its prediction accuracy, obtained without any training on the treatment data, is comparable to that achieved by directly predicting the response from the images, which requires specific training on the treatment evaluation cohorts.

PubMed Disclaimer

Conflict of interest statement

Competing interests

D.-T.H., E.A.S., E.R., G.D., R.A. and T.B. are listed as inventors on a patent (application no. 63/349,829, United States, 2022) filed based on the methodology outlined in this study. G.D., D.S.B., E.E., T.B. and R.A. are employees of Pangea Biomed. E.R. is a cofounder of Medaware, Metabomed and Pangea Biomed (divested from the latter). E.R. serves as a non-paid scientific consultant to Pangea Biomed under a collaboration agreement between Pangea Biomed and the NCI. The other authors declare no competing interests.

Figures

Extended Data Fig. 1 |
Extended Data Fig. 1 |. Model architecture in detail and training strategies.
(a) The feature compression subnetwork consists of an input layer of 2,048 neurons, a bottleneck of 512 neurons, and an output layer of 2,048 neurons. (b) The MLP regression subnetwork consists of an input layer of 512 neurons, a hidden layer of 512 neurons, and an output layer with the number of neurons reflecting the number of genes. (c) In the ensemble learning strategy (bagging), five models were trained independently with five internal training-validation splits; these five model predictions were averaged to make the final prediction. (d) In the model selection strategy, the ‘best’ model with the highest performance on the validation set was chosen to make predictions on the test set. Of note, DeepPT uses ensemble learning.
Extended Data Fig. 2 |
Extended Data Fig. 2 |. The distribution of correlations between the predicted and actual gene expression values across the cohort samples.
The violin plots depict the correlations between the predicted and measured expression values across the cohort samples obtained by HE2RNA (light pink) and DeepPT (light blue) for all genes (a), the top 1,000 genes (b), the top 2,000 genes (c), and the top 3,000 genes (d) with the highest correlations. The results presented in this figure were measured by the mean of 5 folds, as reported in. Except for this figure, all other results presented in this study were measured across the entire test samples, consistent with the approach used in. P-values were calculated using the one-sided Mann-Whitney U test. In violin plots, the central mark is the median. The number of patients in each cohort is shown in parentheses.
Extended Data Fig. 3 |
Extended Data Fig. 3 |. Difference between histopathological features extracted from TCGA-Breast tiles and TransNEO-Breast tiles.
UMAP visualization of 2,048 histopathological features that were extracted by using pre-trained ResNet50 CNN. 4,000 image tiles from each dataset were selected randomly to illustrate. Each point represents each feature vector of one image tile.
Extended Data Fig. 4 |
Extended Data Fig. 4 |. Comparisons of ENLIGHT-DeepPT with other methods.
(a) Performance of ENLIGHT-DeepPT (light blue bars) and the respective drug target(s) expression (gray bars). (b) Performance of ENLIGHT-DeepPT when using the same methodology described in (light blue bars) and a version of ENLIGHT-DeepPT that incorporates the target expression in the scoring method for antibodies (gray bars). (c) Performance of ENLIGHT-DeepPT when using the same methodology described in to generate genetic interaction networks that constitutes ENLIGHT’s predictive biomarkers (light blue bars) and a revised methodology (gray bars) where we restricted ENLIGHT’s biomarker to only include genes that showed high positive correlation (R > 0.4) between actual and DeepPT-predicted values among the respective TCGA cohort (that is, according to the cancer type of each of the five drug response datasets). Results are shown for each of the three datasets where antibody drugs were used and the aggregation of them. Odds Ratio (OR) for each dataset were obtained by using the same clinical decision threshold that has been previously established in. The number of patients in each cohort is shown in parentheses.
Extended Data Fig. 5 |
Extended Data Fig. 5 |. Histograms of the number of tiles per slide by cohort.
The number of tiles in each slide image from TCGA and NCI-Brain datasets ranges from 100 to 8,000 (a-e), while the number of tiles in each TransNEO-Breast slide image is much smaller, ranging from 100 to 1,000 (f ).
Extended Data Fig. 6 |
Extended Data Fig. 6 |. Histogram of median expression over slides.
The median expression over samples of each gene commonly varies from 10 to 100,000 for every dataset considered in this study (a to f).
Extended Data Fig. 7 |
Extended Data Fig. 7 |. The benefit of the autoencoder module.
(a-d) Difference between ResNet features and AutoEncoder features. Histograms of median and standard deviation of ResNet features (a-b) and AutoEncoder features (c-d). The TCGA-BRCA cohort was selected as an example. (e-f ) Model performance on external validation datasets. The violin plots depict the distribution of Pearson correlations between the predicted and experimentally measured expression values across the cohort samples for the top 1,000 genes with the highest correlation. The bar charts indicate the number of genes exhibiting Pearson correlations between the predicted and measured expression values across the cohort samples that are above 0.4. The results are presented separately for each external validation set, TransNeo-Breast (e) (n = 160 patients) and NCI-Brain (f ) (n = 226 patients). P-values were calculated using the one-sided Mann-Whitney U test. In violin plots, the central mark is the median.
Extended Data Fig. 8 |
Extended Data Fig. 8 |. The benefit of ensemble learning.
The violin plots depict the distribution of Pearson correlation for the top 1,000 genes, and the bar charts indicate the number of genes exhibiting Pearson correlations between the predicted and measured expression values across the cohort samples above 0.4. The results were obtained from model selection strategy (gray) and ensemble learning strategy (light blue). P-values were calculated using the one-sided Mann-Whitney U test. In violin plots, the central mark is the median. The number of patients in each cohort is shown in parentheses.
Fig. 1 |
Fig. 1 |. Study overview.
a, Three main components of the DeepPT architecture: the pretrained ResNet50 convolutional neural network (CNN) unit (left) extracts histopathology features from tile images; the autoencoder (middle) compresses the 2,048 features to a lower dimension of 512 features; and the MLP (right) integrates these histopathology features to predict the sample’s gene expression. b, Overview of the ENLIGHT pipeline (illustration taken from ref. 47): ENLIGHT starts by inferring the GI partners of a given drug from various cancer in vitro and clinical data sources. Given the SL and SR partners and the transcriptomics for a given patient sample, ENLIGHT computes a drug-matching score that is used to predict the patient response. Here, ENLIGHT uses the DeepPT-predicted expression to produce drug-matching scores for each patient studied. c, Overview of the analysis using DeepPT and ENLIGHT. Top row, DeepPT was trained with FFPE slide images and matched transcriptomics for an array of different cancer types from TCGA. Middle row, after the training phase, the models were applied to predict gene expression on the internal (held-out) TCGA datasets and on two external datasets on which they were not trained. Bottom row, the predicted tumor transcriptomics in each of the five independent test clinical datasets serves as input to ENLIGHT for predicting the patients’ response to treatment and assessing the overall prediction accuracy.
Fig. 2 |
Fig. 2 |. DeepPT prediction of gene expression from H&E slides.
a, Violin plots depicting the distribution of Pearson correlations between predicted and measured expression values across the cohort samples for all (approximately 18,000) genes (empty) and the top 1,000 genes with the highest correlations (light blue). In violin plots, the central mark is the median. The number of patients in each cohort is shown in parentheses. b, Median correlation between the predicted and measured expression values across the cohort samples obtained by HE2RNA (gray), SEQUOIA (pink) and DeepPT (light blue) for the top 1,000 best-predicted genes (independently selected for each model—those with the highest correlation). The performance of HE2RNA and SEQUOIA is taken as reported in the original publication. c, Mean correlation between the predicted expression values of all genes and their measured values across the samples, obtained by HE2RNA (gray), tRNAsformer (purple) and DeepPT (light blue) for kidney cancer. The performance of HE2RNA and tRNAsformer has been reported in ref. . P values in b and c were calculated using a one-sided permutation test, and their values were zero in every case (*P < 0.001). d, Correlation distribution of the top 1,000 genes (left) and the number of genes with a correlation of >0.4 (right) achieved by DeepPT in two external unseen test cohorts. In violin plots, the central mark is the median. e, Venn diagrams illustrating the overlap between the well-predicted genes (R > 0.4) in TCGA-breast and TransNEO-breast (left) and in TCGA-brain and NCI-brain (right). Both have hypergeometric P values equal to zero. f, Pathway enrichment analysis on the well-predicted genes (R > 0.4). Each row represents a different cancer hallmark, and each column represents a different cohort (the two rightmost columns correspond to the two external cohorts). Values denote the FDR-corrected P values for pathway enrichment among the well-predicted genes (hypergeometric test). BRCA, breast invasive carcinoma; KIRC, kidney renal clear cell carcinoma; LGG, brain lower-grade glioma; LUSC, lung squamous cell carcinoma; LUAD, lung adenocarcinoma; HNSC, head and neck squamous cell carcinoma; PRAD, prostate adenocarcinoma; COAD, colon adenocarcinoma; STAD, stomach adenocarcinoma; KIRP, kidney renal papillary cell carcinoma; CESC, cervical squamous cell carcinoma and endocervical adenocarcinoma; PAAD, pancreatic adenocarcinoma; READ, rectum adenocarcinoma; ESCA, esophageal carcinoma; GBM, glioblastoma multiforme; KICH, kidney chromophobe.
Fig. 3 |
Fig. 3 |. Gene set enrichment analysis identifying pathways whose predicted gene expression correlates with TIL abundance.
Only cancer cohorts with available data for both predicted gene expression and estimated TILs are included in the analysis. P values were calculated using a one-sided permutation test for gene set enrichment analysis (*P < 0.05). The precise P values for each cancer hallmark, following the same order presented in each panel, were as follows: 0.806, 0.440, 0, 0, 0, 0.510, 0, 0, 0, 0 (BRCA); 0, 0.963, 1, 1, 0.981, 0, 0.390, 0, 0, 0 (LUSC); 0.522, 0.426, 0.637, 0.138, 0.121, 0.345, 0.083, 0, 0, 0 (LUAD); 0.931, 0, 0, 0, 0, 0.871, 0.023, 0.369, 0.030, 0.704 (PRAD); 0.147, 0.452, 0.177, 0, 0.048, 0.016, 0, 0.067, 0, 0 (COAD); 0.968, 0.986, 0.007, 0.001, 0.026, 0.068, 0, 0, 0, 0 (STAD); 0.008, 0.110, 0.079, 0, 0, 0.130, 0, 0, 0, 0 (CESC); 0, 0.409, 0.559, 0.496, 0.109, 0, 0.302, 0, 0.241, 0 (PAAD); 0, 0.167, 0.661, 0, 0.076, 0, 0, 0, 0, 0 (READ).
Fig. 4 |
Fig. 4 |. Comparison of the correlation of survival association in terms of log(HR) for three proliferation signatures based on actual and predicted expression.
Top, Ki-67; middle, proliferation index; bottom, EMT pathway. X axis, log(HR) of signature score based on actual expression; Y axis, log(HR) of signature score based on predicted expression. Each point represents a different TCGA cohort, and points are color-coded according to the significance of the survival association (two-sided Cox proportional hazards test) using a corrected P < 0.05 cutoff: green denotes that the survival association was significant by both the actual and predicted signatures; red and black denote that the survival association was significant by the actual or predicted signature only, respectively. Pearson correlation R and corresponding P values are denoted in each panel. The regression line and 95% confidence intervals are shown.
Fig. 5 |
Fig. 5 |. Predicting treatment response from H&E slides.
a, OR (y axis) for the five datasets tested and the aggregate cohort of all patients together (x axis). The drugs and sample sizes are denoted in the x-axis labels. The black horizontal dashed line denotes an OR of 1, which is expected by chance. Asterisks denote the significance of the OR being larger than 1 according to Fisher’s exact test. All P values were FDR corrected, P = 0.16 (PARPi), 0.18 (ALKi), 0.01 (bintrafusp alfa), 0.03 (trastuzumab1), 0.08 (trastuzumab2), 0.007 (all). *P < 0.1, **P < 0.05. b, AP (y axis) for the five datasets and the aggregate cohort, as in a. The black horizontal dashed lines denote the ORR for each dataset. An AP higher than the ORR demonstrates better accuracy than expected by chance. Asterisks denote the significance of the AP being higher than the response rate using a one-sided proportion test. All P values were FDR corrected, P = 0.22 (PARPi), 0.04 (ALKi), 0.04 (bintrafusp alfa), 0.003 (trastuzumab1), 0.11 (trastuzumab2), 0.003 (all). **P < 0.05. c, OR of the direct supervised method (y axis) for all 234 patients as a function of the fraction of patients above a given threshold (coverage, x axis). We present coverage between 10% and 90% only to avoid the measurement noise of extreme coverage values, where data are too small. The blue line denotes the OR of ENLIGHT–DeepPT for all 234 patients at its original clinical decision threshold. The red diamond denotes the threshold on the direct supervised method that yields the same coverage as ENLIGHT–DeepPT at its original, fixed threshold. d, Comparison of the OR of ENLIGHT–DeepPT and that of the direct supervised method (y axis) at thresholds that yield the same coverage (x axis). e, AP of ENLIGHT–DeepPT (light blue) and the direct supervised method (purple) for each dataset and on aggregate as in b. The dashed lines denote the ORR for each case as in b. f, OR for ENLIGHT-actual and ENLIGHT–DeepPT when predicting response to trastuzumab (for the trastuzumab1 cohort). g, Comparison of the AP (y axis) for both ENLIGHT-based models and the Sammut-ML predictor. All methods were applied to the same patient group. The black horizontal dashed line denotes the ORR. The number of patients in each cohort is shown in parentheses.

Update of

References

    1. Golub TR et al. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286, 531–537 (1999). - PubMed
    1. Curtis C et al. The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups. Nature 486, 346–352 (2012). - PMC - PubMed
    1. Doroshow DB & Doroshow JH Genomics and the history of precision oncology. Surg. Oncol. Clin. N. Am 29, 35–49 (2020). - PMC - PubMed
    1. Rosenthal J et al. Building tools for machine learning and artificial intelligence in cancer research: best practices and a case study with the PathML toolkit for computational pathology. Mol. Cancer Res 20, 202–206 (2022). - PMC - PubMed
    1. Ström P et al. Artificial intelligence for diagnosis and grading of prostate cancer in biopsies: a population-based, diagnostic study. Lancet Oncol. 21, 222–232 (2020). - PubMed

LinkOut - more resources