Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Apr 23;15(1):12.
doi: 10.1186/s13040-022-00295-w.

Colorectal cancer subtype identification from differential gene expression levels using minimalist deep learning

Affiliations

Colorectal cancer subtype identification from differential gene expression levels using minimalist deep learning

Shaochuan Li et al. BioData Min. .

Abstract

Background: Cancer molecular subtyping plays a critical role in individualized patient treatment. In previous studies, high-throughput gene expression signature-based methods have been proposed to identify cancer subtypes. Unfortunately, the existing ones suffer from the curse of dimensionality, data sparsity, and computational deficiency.

Methods: To address those problems, we propose a computational framework for colorectal cancer subtyping without any exploitation in model complexity and generality. A supervised learning framework based on deep learning (DeepCSD) is proposed to identify cancer subtypes. Specifically, based on the differentially expressed genes under cancer consensus molecular subtyping, we design a minimalist feed-forward neural network to capture the distinct molecular features in different cancer subtypes. To mitigate the overfitting phenomenon of deep learning as much as possible, L1 and L2 regularization and dropout layers are added.

Results: For demonstrating the effectiveness of DeepCSD, we compared it with other methods including Random Forest (RF), Deep forest (gcForest), support vector machine (SVM), XGBoost, and DeepCC on eight independent colorectal cancer datasets. The results reflect that DeepCSD can achieve superior performance over other algorithms. In addition, gene ontology enrichment and pathology analysis are conducted to reveal novel insights into the cancer subtype identification and characterization mechanisms.

Conclusions: DeepCSD considers all subtype-specific genes as input, which is pathologically necessary for its completeness. At the same time, DeepCSD shows remarkable robustness in handling cross-platform gene expression data, achieving similar performance on both training and test data without significant model overfitting or exploitation of model complexity.

Keywords: Cancer subtype identification; DeepCSD; Differential gene expression.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
The framework of DeepCSD. DeepCSD consists of two representation layers to extract deep features, two dropout layers, and a output layer. DeepCSD has 768 neurons in the first representation layer, 128 in the second representation layer, and 4 in the final layer because molecular subtyping of colorectal cancer is a four-class classification problem in our study
Fig. 2
Fig. 2
Performance comparisons among DeepCSD, SVM, gcForest, and RF on eight cancer molecular datasets. a accuracy (calculated by the mean of accuracy per class). b Precision (calculated by the mean of precision per class). c Specificity (calculated by the mean of specificity per class). d Sensitivity (calculated by the mean of sensitivity per class)
Fig. 3
Fig. 3
The distance correlation heatmap of multiple molecular datasets. From the figure, we can draw the conclusion that GSE20916 and GSE2109 have low correlations with other datasets; it demonstrates DeepCSD’s robustness on different datasets with diverse gene expression characteristics
Fig. 4
Fig. 4
Differential gene expression visualization on GSE39582. a-f represents a comparison of each CMS group and each dot represents a gene: red represents up-regulated gene and blue represents down-regulated gene
Fig. 5
Fig. 5
Gene Clustergram on GSE39582. h illustrates the difference between samples while g provides the genes’ correlation to each sample
Fig. 6
Fig. 6
Performance comparison between DeepCSD and DeepCC. The vertical axis denotes the corresponding performance metric
Fig. 7
Fig. 7
The performance comparisons of DeepCSD with two distinct feature selection methods including EDT and SKB
Fig. 8
Fig. 8
The comparison performance of different learning rates increasing from 0.1 to 0.9
Fig. 9
Fig. 9
The metabolic pathways (hsa01100) is provided by KEGG database: the genes marked “red” are the subtype-specific genes in this pathway

Similar articles

Cited by

References

    1. Sveen A, Bruun J, Eide PW, et al. Colorectal cancer consensus molecular subtypes translated to preclinical models uncover potentially targetable cancer cell dependencies[J] Clin Cancer Res. 2018;24(4):794–806. doi: 10.1158/1078-0432.CCR-17-1234. - DOI - PubMed
    1. Gao F, Wang W, Tan M, et al. DeepCC: a novel deep learning-based framework for cancer molecular subtype classification[J] Oncogenesis. 2019;8(9):1–12. doi: 10.1038/s41389-019-0157-8. - DOI - PMC - PubMed
    1. Breugom AJ, et al. Adjuvant chemotherapy and relative survival of patients with stage II colon cancer-A EURECCA international comparison between the Netherlands, Denmark, Sweden, England, Ireland, Belgium, and Lithuania. Eur J Cancer. 2016;63:110–7. doi: 10.1016/j.ejca.2016.04.017. - DOI - PubMed
    1. Dotan E, Cohen SJ. Challenges in the management of stage II colon cancer. Semin Oncol. 2011;38:511–20. doi: 10.1053/j.seminoncol.2011.05.005. - DOI - PMC - PubMed
    1. Tannock IF, Hickman JA. Limits to personalized cancer medicine. N Engl J Med. 2016;375(13):1289–94. doi: 10.1056/NEJMsb1607705. - DOI - PubMed

LinkOut - more resources