Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Sep 19;4(9):101173.
doi: 10.1016/j.xcrm.2023.101173. Epub 2023 Aug 14.

Deep learning integrates histopathology and proteogenomics at a pan-cancer level

Collaborators, Affiliations

Deep learning integrates histopathology and proteogenomics at a pan-cancer level

Joshua M Wang et al. Cell Rep Med. .

Abstract

We introduce a pioneering approach that integrates pathology imaging with transcriptomics and proteomics to identify predictive histology features associated with critical clinical outcomes in cancer. We utilize 2,755 H&E-stained histopathological slides from 657 patients across 6 cancer types from CPTAC. Our models effectively recapitulate distinctions readily made by human pathologists: tumor vs. normal (AUROC = 0.995) and tissue-of-origin (AUROC = 0.979). We further investigate predictive power on tasks not normally performed from H&E alone, including TP53 prediction and pathologic stage. Importantly, we describe predictive morphologies not previously utilized in a clinical setting. The incorporation of transcriptomics and proteomics identifies pathway-level signatures and cellular processes driving predictive histology features. Model generalizability and interpretability is confirmed using TCGA. We propose a classification system for these tasks, and suggest potential clinical applications for this integrated human and machine learning approach. A publicly available web-based platform implements these models.

Keywords: CPTAC; cancer imaging; cancer proteogenomics; computational pathology; molecular diagnostics.

PubMed Disclaimer

Conflict of interest statement

Declaration of interests The authors declare no competing interests.

Figures

None
Graphical abstract
Figure 1
Figure 1
Workflow, data split, and model performance (A) Overall workflow. Multi-resolution Panoptes models were trained on H&E slide images from six cancer types. Multi-CCA correlated proteomics, transcriptomics, and extracted imaging features from CNN models to reveal significant pathways and molecular signatures. (B) Per-slide level AUROCs of imaging-based prediction tasks with 95% confidence intervals.
Figure 2
Figure 2
Tissue-of-origin model performance and omics-integration (A) AUROC for each cancer type at per-slide level. (B) AUROC at per-tile level. (C) Features extracted from penultimate layer are separated with tSNE; each dot represents a tumor tile colored by tissue origin. (D) Feature extraction where each dot represents NAT tiles colored by tissue origin. (E) CCA canonical variate highlighting similarities between UCEC and LUAD samples. Line graphs represent standardized coefficients for subsets of imaging, gene, and proteome features. Each dot represents an image-proteogenomic paired sample. GO term enrichment assessed on subset of genes and proteome features with non-zero values in loading matrix. (F and G) Top and bottom images represent tiles with highest and lowest scores, respectively. Histopathology annotations reflect enriched GO terms.
Figure 3
Figure 3
Feature visualization and cross-testing of tumorigenesis models (A) Example UCEC slide with tumor tissue on left and normal tissue on right. (B) Prediction heatmap of example slide with hotter areas (red) highlighting tiles more likely to be tumor tissue. (C) CAM of example slide by tiles with hotter areas emphasizing the tumor tissue. (D–F) Feature extraction from tumorigenesis imaging model by tSNE; each dot represents a tile colored by prediction score, true label, and cancer type, respectively. (G) Example tiles of integrated saliency results highlighting accumulation of nuclei, with densest regions largely composed of stromal lymphoplasmacytic infiltrates. (H) Heatmap showing per-slide AUROCs of applying single cancer type trained models to the other cancer types.
Figure 4
Figure 4
Major canonical variates associated with tumorigenesis (A) Canonical variate with strongest correlation separating NAT/tumor samples across all six cancer types. (B) Tiles from highest-scoring regions show mitotic morphologies consistent with enriched transcriptomic and proteomic enrichment. (C) Tiles from lowest-scoring region. (D) Second canonical variate distinguishing NAT/tumor samples. (E and F) Tile scoring parallels enriched biological processes. Tile borders indicate scores; top-scoring regions (red) match tumorigenic areas with increased glycolytic activity, and bottom-scoring (blue) areas correspond with smooth muscle and blood vessel architectures.
Figure 5
Figure 5
Model performance and multi-omics assessment of grade and stage (A) Per-slide performance of models trained on tumor grade and disease stage. Numeric predictions represent expected value from softmax layer (x=04p(x)(x)) where x represents grade or stage outcome). AUROC for each outcome denoted. (B) CCA canonical variate uniquely observed in grade analysis. Tiles with highest projected values (shown by more intense red borders) reflect regions with disorganized tumor nests lacking lumen formation and glandular regions with loss of basal nuclear polarity. Paler tile borders reflect lower projected values.
Figure 6
Figure 6
Performance, visualization, and feature extraction of biomarkers (A) One-tail Wilcoxon tests on prediction scores between positively and negatively labeled samples at per-tile level with significance levels. (B) Extraction and visualization of features learned by pan-cancer TP53 mutation model with tSNE. Reference plots of prediction scores and true labels on the right. (C) Canonical variate with strongest association between image and proteogenomic features. (D) Top tiles demonstrate highly cellular disordered regions correlating with TP53 mutated samples. (E) Bottom tiles (wild-type) highlight organized and well-differentiated regions. (F) Canonical variate correlating increased IL-1 activity with TP53 mutated samples. (G) Wild-type samples in canonical variate no. 3 highlight densely packed but relatively preserved tissue architectures. (H) Conversely, mutated samples reside in the bottom portion and show areas of increased immune infiltrate activity.
Figure 7
Figure 7
Panoptes Web (A) App workflow. (B) Boxplot assessment of probability scores and class outcomes, and individual tile probability visualization.

References

    1. Niazi M.K.K., Parwani A.V., Gurcan M.N. Digital pathology and artificial intelligence. Lancet Oncol. 2019;20:e253–e261. doi: 10.1016/S1470-2045(19)30154-8. - DOI - PMC - PubMed
    1. Srinidhi C.L., Ciga O., Martel A.L. Deep neural network models for computational histopathology: A survey. Med. Image Anal. 2021;67 doi: 10.1016/j.media.2020.101813. - DOI - PMC - PubMed
    1. Coudray N., Ocampo P.S., Sakellaropoulos T., Narula N., Snuderl M., Fenyö D., Moreira A.L., Razavian N., Tsirigos A. Classification and mutation prediction from non–small cell lung cancer histopathology images using deep learning. Nat. Med. 2018;24:1559–1567. doi: 10.1038/s41591-018-0177-5. - DOI - PMC - PubMed
    1. Hong R., Liu W., Fenyö D. Predicting and Visualizing STK11 Mutation in Lung Adenocarcinoma Histopathology Slides Using Deep Learning. BioMedInformatics. 2021;2:101–105. doi: 10.3390/biomedinformatics2010006. - DOI
    1. Hong R., Liu W., DeLair D., Razavian N., Fenyö D. Predicting endometrial cancer subtypes and molecular features from histopathology images using multi-resolution deep learning models. Cell Rep. Med. 2021;2 doi: 10.1016/j.xcrm.2021.100400. - DOI - PMC - PubMed

Publication types