Colorectal cancer subtype identification from differential gene expression levels using minimalist deep learning
- PMID: 35461302
- PMCID: PMC9034628
- DOI: 10.1186/s13040-022-00295-w
Colorectal cancer subtype identification from differential gene expression levels using minimalist deep learning
Abstract
Background: Cancer molecular subtyping plays a critical role in individualized patient treatment. In previous studies, high-throughput gene expression signature-based methods have been proposed to identify cancer subtypes. Unfortunately, the existing ones suffer from the curse of dimensionality, data sparsity, and computational deficiency.
Methods: To address those problems, we propose a computational framework for colorectal cancer subtyping without any exploitation in model complexity and generality. A supervised learning framework based on deep learning (DeepCSD) is proposed to identify cancer subtypes. Specifically, based on the differentially expressed genes under cancer consensus molecular subtyping, we design a minimalist feed-forward neural network to capture the distinct molecular features in different cancer subtypes. To mitigate the overfitting phenomenon of deep learning as much as possible, L1 and L2 regularization and dropout layers are added.
Results: For demonstrating the effectiveness of DeepCSD, we compared it with other methods including Random Forest (RF), Deep forest (gcForest), support vector machine (SVM), XGBoost, and DeepCC on eight independent colorectal cancer datasets. The results reflect that DeepCSD can achieve superior performance over other algorithms. In addition, gene ontology enrichment and pathology analysis are conducted to reveal novel insights into the cancer subtype identification and characterization mechanisms.
Conclusions: DeepCSD considers all subtype-specific genes as input, which is pathologically necessary for its completeness. At the same time, DeepCSD shows remarkable robustness in handling cross-platform gene expression data, achieving similar performance on both training and test data without significant model overfitting or exploitation of model complexity.
Keywords: Cancer subtype identification; DeepCSD; Differential gene expression.
© 2022. The Author(s).
Conflict of interest statement
The authors declare that they have no competing interests.
Figures









Similar articles
-
DeepCC: a novel deep learning-based framework for cancer molecular subtype classification.Oncogenesis. 2019 Aug 16;8(9):44. doi: 10.1038/s41389-019-0157-8. Oncogenesis. 2019. PMID: 31420533 Free PMC article.
-
Automated exploitation of deep learning for cancer patient stratification across multiple types.Bioinformatics. 2023 Nov 1;39(11):btad654. doi: 10.1093/bioinformatics/btad654. Bioinformatics. 2023. PMID: 37934154 Free PMC article.
-
BCDForest: a boosting cascade deep forest model towards the classification of cancer subtypes based on gene expression data.BMC Bioinformatics. 2018 Apr 11;19(Suppl 5):118. doi: 10.1186/s12859-018-2095-4. BMC Bioinformatics. 2018. PMID: 29671390 Free PMC article.
-
Optimizing neural networks for medical data sets: A case study on neonatal apnea prediction.Artif Intell Med. 2019 Jul;98:59-76. doi: 10.1016/j.artmed.2019.07.008. Epub 2019 Jul 25. Artif Intell Med. 2019. PMID: 31521253
-
Molecular Subtyping of Cancer Based on Distinguishing Co-Expression Modules and Machine Learning.Front Genet. 2022 May 2;13:866005. doi: 10.3389/fgene.2022.866005. eCollection 2022. Front Genet. 2022. PMID: 35586568 Free PMC article.
Cited by
-
Deep learning generates custom-made logistic regression models for explaining how breast cancer subtypes are classified.PLoS One. 2023 May 22;18(5):e0286072. doi: 10.1371/journal.pone.0286072. eCollection 2023. PLoS One. 2023. PMID: 37216350 Free PMC article.
-
The role of deep learning in diagnosing colorectal cancer.Prz Gastroenterol. 2023;18(3):266-273. doi: 10.5114/pg.2023.129494. Epub 2023 Jul 17. Prz Gastroenterol. 2023. PMID: 37937113 Free PMC article. Review.
-
Amogel: a multi-omics classification framework using associative graph neural networks with prior knowledge for biomarker identification.BMC Bioinformatics. 2025 Mar 28;26(1):94. doi: 10.1186/s12859-025-06111-6. BMC Bioinformatics. 2025. PMID: 40155814 Free PMC article.
References
LinkOut - more resources
Full Text Sources