Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2023 Dec 13:rs.3.rs-3607399.
doi: 10.21203/rs.3.rs-3607399/v1.

Self-supervised artificial intelligence predicts recurrence, metastasis and disease specific death from primary cutaneous squamous cell carcinoma at diagnosis

Affiliations

Self-supervised artificial intelligence predicts recurrence, metastasis and disease specific death from primary cutaneous squamous cell carcinoma at diagnosis

Nicolas Coudray et al. Res Sq. .

Update in

Abstract

Primary cutaneous squamous cell carcinoma (cSCC) is responsible for ~10,000 deaths annually in the United States. Stratification of risk of poor outcome (PO) including recurrence, metastasis and disease specific death (DSD) at initial biopsy would significantly impact clinical decision-making during the initial post operative period where intervention has been shown to be most effective. In this multi-institutional study, we developed a state-of-the-art self-supervised deep-learning approach with interpretability power and demonstrated its ability to predict poor outcomes of cSCCs at the time of initial biopsy. By highlighting histomorphological phenotypes, our approach demonstrates that poor differentiation and deep invasion correlate with poor prognosis. Our approach is particularly efficient at defining poor outcome risk in Brigham and Women's Hospital (BWH) T2a and American Joint Committee on Cancer (AJCC) T2 cSCCs. This bridges a significant gap in our ability to assess risk among T2a/T2 cSCCs and may be useful in defining patients at highest risk of poor outcome at the time of diagnosis. Early identification of highest-risk patients could signal implementation of more stringent surveillance, rigorous diagnostic work up and identify patients who might best respond to early postoperative adjunctive treatment.

PubMed Disclaimer

Conflict of interest statement

Competing interests The authors declare that they have no competing financial interests.

Figures

Figure 1.
Figure 1.. Adaptation of the self-supervised Histological Phenotype Learning pipeline to study cutaneous squamous cell cancer.
a. The slides were first tiled into smaller images of 224 × 224 pixels at 0.5 um/pixel (equivalent to a magnification of 20x). b. A subset of those tiles were used to train the self-supervised Barlow-Twins architecture. c. Once trained, all the tiles from the three cohorts were then projected onto the trained network to extract their tile vector representations z, a 128 vector coding each image. d. Those vector representations are then over-clustered using the Leiden approach in order to get homogeneous clusters (called Histomorphological Phenotype Clusters, HPC) and visually identify artifacts from tissue representations. In this UMAP of the tile vector representation z, each dot represents a tile, and each color a different HPC. e. Tiles belonging to HPCs identified as highly enriched in artifacts are removed from the study. f. The cleaned dataset is then subject to more detailed analysis and subjected to a new round of Leiden clustering. This UMAP of the cleaned tile vector representations z shows 26 HPCs corresponding to 26 groups of self-identified phenotypes, and representative tile for the top 5 clusters corresponding to the example slides in panel c. g. The resulting HPCs can then be used to generate heatmaps showing simplified slide representations and analyzed to identify potential correlations between those phenotypes identified by the self-supervised approach and patients’ outcome. Here, the heat maps corresponding to the example slide section in panel a is shown, with the top 5 clusters numbered and corresponding to the ones in panel f. All tiles are shown after Reinhard’s color normalization(48).
Figure 2.
Figure 2.. Self-supervised approach generates clusters enriched in tiles from patients with poor outcome, with good representation of the three cohorts, and achieving a c-index of 0.73 in disease-free survival prediction while providing tile clusters important for that prediction.
a. UMAP with the 26 Leiden clusters found at resolution 0.75. b. PAGA representation of the Leiden clusters with node connections. The size of the nodes is proportional to the number of tiles and their color is proportional to the proportion of tiles associated with good/poor outcome patients. c. UMAP with colors showing tiles associated with good/poor outcome patients (green/orange). Each dot is a tile. d. Tile distributions on the HPCs. e. Number of patients present in the different HPCs, stratified by the number of tiles for each patient. f. Number of patients present in the different HPCs, stratified by the enrolling institution (maximum number of patients of 163). g. Kaplan-Meier curve of predicted high and low risk patients of having poor outcome from the self-supervised HPL approach using a Cox regression. h. Kaplan-Meier curves of predicted high- and low-risk patients of having poor outcome from the self-supervised HPL approach using a Cox regression, showing the stages from subsets of panel 2g (error bars correspond to 95% confidence interval). i. Forest plot with log hazard ratio of Cox proportional hazards model over the three-fold cross-validation. For each HPC, the log hazard ratio, p-value and percentage of patients with at least one tile belonging to that HPC are shown from left to right. Coefficients were averaged across folds and p-values combined with Fisher’s probability test. j. Interpretability of the HPCs via SHAP shows, at the top, which HPCs favor higher risk of poor outcome prediction when enrichment in tiles is present for patients, and which favor prediction for good outcome (bottom).
Figure 3.
Figure 3.. PAGA graph shows a coherent organization of features found on cutaneous squamous cell carcinoma whole slide images.
Annotations provided by a group of Mohs surgeons, of which included 100 tiles randomly selected for each HPC (annotation taken from Supplementary Table 6) and are projected on the PAGA graph from Figure 2C.
Figure 4.
Figure 4.. Example of tiles from HPCs associated with higher risk of poor outcome.
a. Example of tiles randomly selected from certain HPCs leading to risk prediction of poor outcome. b-c. Examples of data from patients with poor outcome shortly after surgery (10.5 months, local recurrence) and with poor outcome a few years after surgery (46 months, nodal metastasis). For each case, a small portion of the original slide is shown as well as the corresponding heatmap and the associated SHAP decision plot. The color of the heatmap shows the HPC associated with each tile, with the proportion of tile belonging to each HPC shown in the legend (percentages computed over the whole slide(s) available for each patient). The top of the SHAP decision plot shows the predicted value which determines the color of the curve. Reading from bottom to top, the SHAP values for each HPC are cumulatively summed, and the HPCs are ordered according to the absolute SHAP weight. On the right, the proportion of tiles associated with each cluster is shown on a Log10 scale. All tiles are shown after Reinhard’s color normalization (48).
Figure 5.
Figure 5.. Example of tiles from HPCs associated with lower risk of poor outcome.
a. Example of tiles randomly selected from certain HPCs leading to prediction of good outcome. b. The interaction analysis between HPCs shows two groups of HPCs which tend to be adjacent on slides; each column shows the normalized proportion of interactions each tile associated with a given HPC has with HPCs associated with its adjacent tiles. The dendrograms correspond to bi-hierarchical clustering of HPCs. c-d. Examples of data from patients who have not recurred and have been followed for more than three years. For each case, a small portion of the original slide is shown as well as the corresponding heatmap and the associated SHAP decision plot. The color of the heatmap shows the HPC associated with each tile, with the proportion of tile belonging to each HPC shown in the legend (percentages computed over the whole slide(s) available for each patient). The top of the SHAP decision plot shows the predicted value which determines the color of the curve. Reading from bottom to top, the SHAP values for each HPC are cumulatively summed, and the HPCs are ordered according to the absolute SHAP weight. On the right, the proportion of tiles associated with each cluster is shown on a Log10 scale. All tiles are shown after Reinhard’s color normalization (48).
Figure 6.
Figure 6.. Specific HPCs are correlated with pathologic diagnosis or type of poor outcome.
a. Spearman correlation between the HPCs and the whole slide pathologic diagnosis available for the NYU slides. b. Spearman correlation between the HPCs and the type of poor outcome (LR for local recurrence versus overall metastatic). The box plots show the Spearman correlation (positive in red, negative in blue) with the intensity modulated by the p-value. For simplicity, x-axis is ordered as the y-axis of the SHAP plot in Figure 2c. c-d. Projection on the UMAP and PAGA graph of the HPCs associated with high and low risk of poor outcome. HPCs associated with higher risk of metastasis (nodal metastasis (NM) and distant metastasis (DM)) are shown in red, those associated with local recurrence in dark yellow (taken from panel b), and those associated with non-specific poor outcome in paler yellow. The two clusters of likely correlated HPCs associated with good outcome are shown in two shades of green (from Figure 5b). e. Ultimately, we anticipate such a deep-learning tool, which identifies patients at higher risk with poor outcome and provides histomorphological interpretability, could assist treating physicians in making decisions on an increased post-operative follow-up and management strategy.

References

    1. Lomas A, Leonardi-Bee J, Bath-Hextall F. A systematic review of worldwide incidence of nonmelanoma skin cancer. Br. J. Dermatol. 2012;166(5):1069–1080. - PubMed
    1. Waldman A, Schmults C. Cutaneous Squamous Cell Carcinoma. Hematol. Oncol. Clin. North Am. 2019;33(1):1–12. - PubMed
    1. Stang A, et al. Incidence and mortality for cutaneous squamous cell carcinoma: comparison across three continents. J. Eur. Acad. Dermatol. Venereol. 2019;33 Suppl 8(Suppl 8):6–10. - PMC - PubMed
    1. Leiter U, et al. Incidence, Mortality, and Trends of Nonmelanoma Skin Cancer in Germany. Journal of Investigative Dermatology 2017;137(9):1860–1867. - PubMed
    1. van Lee CB, et al. Recurrence rates of cutaneous squamous cell carcinoma of the head and neck after Mohs micrographic surgery vs. standard excision: a retrospective cohort study. Br. J. Dermatol. 2019;181(2):338–343. - PubMed

Publication types

LinkOut - more resources