Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Apr 19;4(1):71.
doi: 10.1038/s41746-021-00427-2.

Interpretable survival prediction for colorectal cancer using deep learning

Affiliations

Interpretable survival prediction for colorectal cancer using deep learning

Ellery Wulczyn et al. NPJ Digit Med. .

Abstract

Deriving interpretable prognostic features from deep-learning-based prognostic histopathology models remains a challenge. In this study, we developed a deep learning system (DLS) for predicting disease-specific survival for stage II and III colorectal cancer using 3652 cases (27,300 slides). When evaluated on two validation datasets containing 1239 cases (9340 slides) and 738 cases (7140 slides), respectively, the DLS achieved a 5-year disease-specific survival AUC of 0.70 (95% CI: 0.66-0.73) and 0.69 (95% CI: 0.64-0.72), and added significant predictive value to a set of nine clinicopathologic features. To interpret the DLS, we explored the ability of different human-interpretable features to explain the variance in DLS scores. We observed that clinicopathologic features such as T-category, N-category, and grade explained a small fraction of the variance in DLS scores (R2 = 18% in both validation sets). Next, we generated human-interpretable histologic features by clustering embeddings from a deep-learning-based image-similarity model and showed that they explained the majority of the variance (R2 of 73-80%). Furthermore, the clustering-derived feature most strongly associated with high DLS scores was also highly prognostic in isolation. With a distinct visual appearance (poorly differentiated tumor cell clusters adjacent to adipose tissue), this feature was identified by annotators with 87.0-95.5% accuracy. Our approach can be used to explain predictions from a prognostic deep learning model and uncover potentially-novel prognostic features that can be reliably identified by people for future validation studies.

PubMed Disclaimer

Conflict of interest statement

E.W., D.F.S., M.M., F.T., P.-H.C.C., N.H., A.S., R.M., B.A., G.S.C., L.H.P., D.T., Z.X., Y.L., M.C.S., and C.H.M. are current or past employees of Google LLC and own Alphabet stock. I.F.-A. and T.B. are consultants of Google LLC. M.P., R.R., P.R., H.M., and K.Z. are employees of the Medical University of Graz.

Figures

Fig. 1
Fig. 1. Kaplan–Meier curves on both validation sets for patients stratified by the prognostic deep learning system (DLS).
Results are presented for stage II and stage III patients separately, and as a combined cohort (Stage II/III). High- and low-risk groups represent the highest and lowest risk quartiles from the tune set, respectively, based on the DLS prediction. Hazard ratios (HR) for the medium and high-risk groups are provided with the low-risk group as the reference group. Shaded areas represent 95% confidence intervals. p Values were calculated using the log-rank test comparing each high-risk group with the corresponding low-risk group.
Fig. 2
Fig. 2. Representative patches for clustering-derived features associated with predictions of the deep learning system (DLS).
Sample patches for a set of 10 clustering-derived features are shown. For each feature, the ten patches closest to the centroid were selected, after filtering to ensure they were from distinct cases (“Methods”). The case-level quantitation of these 4 high-risk and 6 low-risk features explains the majority of the variance in case-level DLS scores. Features are ranked according to the average DLS score, which is provided in parentheses. Scale bar indicates 0.1 mm.
Fig. 3
Fig. 3. Visualizations and survival analysis of the clustering-derived feature with the highest DLS-predicted risk score (tumor-adipose feature, TAF).
a Additional sample patches of the TAF cluster, each from a unique case. Scale bar indicates 0.1 mm. b Kaplan Meier curves on both validation sets for patients stratified by quantitation of TAF. These curves were generated following the same procedure as in Fig. 1. In stage II cases, the deviation in at-risk counts from the quartile marks for the low-risk and medium-risk groups are because many stage II cases did not contain any TAF.

References

    1. Amin MB, et al. The Eighth Edition AJCC Cancer Staging Manual: Continuing to build a bridge from a population-based to a more “personalized’‘ approach to cancer staging. CA Cancer J. Clin. 2017;67:93–99. - PubMed
    1. Schneider NI, Langner C. Prognostic stratification of colorectal cancer patients: current perspectives. Cancer Manag. Res. 2014;6:291–300. - PMC - PubMed
    1. Weiser MR, et al. Individualized prediction of colon cancer recurrence using a nomogram. J. Clin. Oncol. 2008;26:380–385. - PubMed
    1. Skrede O-J, et al. Deep learning for prediction of colorectal cancer outcome: a discovery and validation study. Lancet. 2020;395:350–360. - PubMed
    1. Kather JN, et al. Predicting survival from colorectal cancer histology slides using deep learning: a retrospective multicenter study. PLoS Med. 2019;16:e1002730. - PMC - PubMed

LinkOut - more resources