Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Jan;9(1):3-17.
doi: 10.1002/cjp2.302. Epub 2022 Nov 14.

Optimization of deep learning models for the prediction of gene mutations using unsupervised clustering

Affiliations

Optimization of deep learning models for the prediction of gene mutations using unsupervised clustering

Zihan Chen et al. J Pathol Clin Res. 2023 Jan.

Abstract

Deep learning models are increasingly being used to interpret whole-slide images (WSIs) in digital pathology and to predict genetic mutations. Currently, it is commonly assumed that tumor regions have most of the predictive power. However, it is reasonable to assume that other tissues from the tumor microenvironment may also provide important predictive information. In this paper, we propose an unsupervised clustering-based multiple-instance deep learning model for the prediction of genetic mutations using WSIs of three cancer types obtained from The Cancer Genome Atlas. Our proposed model facilitates the identification of spatial regions related to specific gene mutations and exclusion of patches that lack predictive information through the use of unsupervised clustering. This results in a more accurate prediction of gene mutations when compared with models using all image patches on WSIs and two recently published algorithms for all three different cancer types evaluated in this study. In addition, our study validates the hypothesis that the prediction of gene mutations solely based on tumor regions on WSI slides may not always provide the best performance. Other tissue types in the tumor microenvironment could provide a better prediction ability than tumor tissues alone. These results highlight the heterogeneity in the tumor microenvironment and the importance of identification of predictive image patches in digital pathology prediction tasks.

Keywords: H&E image; deep learning; gene mutation; unsupervised clustering; whole-slide images.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Framework of unsupervised clustering‐based deep learning modeling for prediction of gene mutations. (A) Each whole‐slide H&E image was preprocessed to (i) remove the background areas using the Otsu method, (ii) split into nonoverlapping tiles with a size of 224 × 224 pixels, and (iii) color normalized. A fine‐tuned Xception model‐based feature extractor was used to generate patch representations. (B) For each cancer type, K‐means clustering was used to group patches into four clusters. The cluster labels of the patch were then assigned by k‐NN algorithm. A neural network was trained on each cluster data and the model with best predictive performance among the four clusters was selected based on five‐fold cross‐validation (the average AUC values on unseen test fold was reported). (C) For the WSI model, all patches extracted from WSIs were used to train the model. (D) For the tumor region model, we used the NCT‐CRC‐HE‐100K dataset to train a CRC tissue classifier and tested it on the CRC‐VAL‐HE‐7K dataset. For each WSI, the tumor patches were selected by the classifier and were used to train the mutation prediction model. Finally, we compared the best‐cluster optimized model, the WSI‐based model, and the tumor‐region‐based model using the average AUC of five‐fold cross validation.
Figure 2
Figure 2
Comparison of model performance (average AUC values) of the proposed best‐cluster optimized algorithm with the WSI‐based model using all patches from WSIs without patch selection. Red points represent the best‐cluster results; green points represent models using WSIs. The bar charts show the difference in average AUC between the best‐cluster optimized model and the WSI‐based model. The genes that can be robustly predicted (AUC > 0.6) are displayed.
Figure 3
Figure 3
Visualization of the proposed algorithm for different genes in LUAD. The deep learning‐based unsupervised clustering and mutation predictions are visualized to understand the spatial locations of each cluster, to identify the spatial regions related to mutation of a specific gene via the resolved probability scores, and to highlight the heterogeneity of a predicted genotype in the tumor microenvironment. The heatmap shows the probability scores of the gene mutations in the identified best cluster. The tile with the highest probability of mutations for each gene is displayed and the corresponding tissue type is provided.
Figure 4
Figure 4
Visualization of the proposed algorithm for different genes in HNSCC. The deep learning‐based unsupervised clustering and mutation predictions are visualized to understand the spatial locations of each cluster, to identify the spatial regions related to mutation of a specific gene via the resolved probability scores, and to highlight the heterogeneity of a predicted genotype in the tumor microenvironment. The heatmap shows the probability scores of the gene mutation in the identified best cluster. The tile with the highest probability of mutation for each gene is displayed and the corresponding tissue type is provided.
Figures 5
Figures 5
Visualization of the proposed algorithm for different genes in CRC. The deep learning‐based unsupervised clustering and mutation predictions are visualized to understand the spatial locations of each cluster, to identify the spatial regions related to mutation of a specific gene via the resolved probability scores, and to highlight the heterogeneity of a predicted genotype in the tumor microenvironment. The heatmap shows the probability scores of the gene mutation in the identified best cluster. The tile with the highest probability of mutation for each gene is displayed and the corresponding tissue type is provided.
Figure 6
Figure 6
Comparison of the proposed best‐cluster optimized model with tumor‐region‐based for CRC. Red points represent the best‐cluster results; green points represent models trained on tumor patches. The bar charts show the difference in average AUC between clustering model and all‐patch model.

Similar articles

Cited by

References

    1. Abeshouse A, Ahn J, Akbani R, et al. The molecular taxonomy of primary prostate cancer. Cell 2015; 163: 1011–1025. - PMC - PubMed
    1. Bailey P, Chang DK, Nones K, et al. Genomic analyses identify molecular subtypes of pancreatic cancer. Nature 2016; 531: 47–52. - PubMed
    1. Dienstmann R, Vermeulen L, Guinney J, et al. Consensus molecular subtypes and the evolution of precision medicine in colorectal cancer. Nat Rev Cancer 2017; 17: 79–92. - PubMed
    1. Lindeman NI, Cagle PT, Beasley MB, et al. Molecular testing guideline for selection of lung cancer patients for EGFR and ALK tyrosine kinase inhibitors: guideline from the College of American Pathologists, International Association for the Study of Lung Cancer, and Association for Molecular Pathology. J Thorac Oncol 2013; 8: 823–859. - PMC - PubMed
    1. Russnes HG, Lingjærde OC, Børresen‐Dale A‐L, et al. Breast cancer molecular stratification: from intrinsic subtypes to integrative clusters. Am J Pathol 2017; 187: 2152–2162. - PubMed

Publication types

LinkOut - more resources