Molecular Subtyping of Cancer Based on Distinguishing Co-Expression Modules and Machine Learning

Peishuo Sun¹, Ying Wu², Chaoyi Yin¹, Hongyang Jiang¹, Ying Xu³, Huiyan Sun^{1

4}

Affiliations

¹ School of Artificial Intelligence, Jilin University, Changchun, China.
² Phase I Clinical Trails Center, The First Affiliated Hospital, China Medical University, Shenyang, China.
³ Computational Systems Biology Lab, Department of Biochemistry and Molecular Biology and Institute of Bioinformatics University of Georgia, Athens, GA, United States.
⁴ Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, China.

PMID: 35586568
PMCID: PMC9108363
DOI: 10.3389/fgene.2022.866005

Molecular Subtyping of Cancer Based on Distinguishing Co-Expression Modules and Machine Learning

Peishuo Sun et al. Front Genet. 2022.

. 2022 May 2:13:866005.

doi: 10.3389/fgene.2022.866005. eCollection 2022.

Authors

Peishuo Sun¹, Ying Wu², Chaoyi Yin¹, Hongyang Jiang¹, Ying Xu³, Huiyan Sun^{1

4}

Affiliations

¹ School of Artificial Intelligence, Jilin University, Changchun, China.
² Phase I Clinical Trails Center, The First Affiliated Hospital, China Medical University, Shenyang, China.
³ Computational Systems Biology Lab, Department of Biochemistry and Molecular Biology and Institute of Bioinformatics University of Georgia, Athens, GA, United States.
⁴ Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, China.

PMID: 35586568
PMCID: PMC9108363
DOI: 10.3389/fgene.2022.866005

Abstract

Molecular subtyping of cancer is recognized as a critical and challenging step towards individualized therapy. Most existing computational methods solve this problem via multi-classification of gene-expressions of cancer samples. Although these methods, especially deep learning, perform well in data classification, they usually require large amounts of data for model training and have limitations in interpretability. Besides, as cancer is a complex systemic disease, the phenotypic difference between cancer samples can hardly be fully understood by only analyzing single molecules, and differential expression-based molecular subtyping methods are reportedly not conserved. To address the above issues, we present here a new framework for molecular subtyping of cancer through identifying a robust specific co-expression module for each subtype of cancer, generating network features for each sample by perturbing correlation levels of specific edges, and then training a deep neural network for multi-class classification. When applied to breast cancer (BRCA) and stomach adenocarcinoma (STAD) molecular subtyping, it has superior classification performance over existing methods. In addition to improving classification performance, we consider the specific co-expressed modules selected for subtyping to be biologically meaningful, which potentially offers new insight for diagnostic biomarker design, mechanistic studies of cancer, and individualized treatment plan selection.

Keywords: machine learning; molecular subtyping of cancer; multi-classification; network perturbation; specific co-expression module.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

**FIGURE 1**
**(A)** The workflow from data processing to specific edges identification. Take four-subclass classification as an example. Each subtype is represented as a gene expression matrix with n genes after data processing. WGCNA is used to divide whole gene set into different co-expression modules. The specific edges of one subtype are extracted from the specific module of their subtype. The perturbation of these specific edges (gene pairs) is used to generate network features data. **(B)** Detailed process of generating one piece of network feature data. The perturbation values of a sample are the difference of specific edges between expanded network and the reference network.

**FIGURE 2**
Sufficient network feature data generation for model training and prediction. One reference sample set consists of T groups of samples that from T subtype (T: total number of subclass). Network feature data corresponding to training samples are used for model training.

**FIGURE 3**
Cancer subtyping performance by seven methods: our method SCM-DNN,HSIC-LASSO, ANOVA, Chi-square mutual information, SCP and DeepCC **(A)** BRCA subtyping and **(B)** STAD subtyping with using top100 and 200 distinguishing co-expressed gene pairs.

**FIGURE 4**
Venn diagram for overlaps among top 100 (network) features obtained by SCM-DNN, HSIC-LASSO, ANOVA, Chi-square and mutual information in **(A)** BRCA and **(B)** STAD.

See this image and copyright information in PMC

References

1. Anglani R., Creanza T. M., Liuzzi V. C., Piepoli A., Panza A., Andriulli A., et al. (2014). Loss of Connectivity in Cancer Co-Expression Networks. PLoS ONE 9, e87075. 10.1371/journal.pone.0087075 - DOI - PMC - PubMed
1. Cascianelli S., Molineris I., Isella C., Masseroli M., Medico E. (2020). Machine Learning for Rna Sequencing-Based Intrinsic Subtyping of Breast Cancer. Sci. Rep. 10, 14071. 10.1038/s41598-020-70832-2 - DOI - PMC - PubMed
1. Chaisaingmongkol J., Budhu A., Dang H., Rabibhadana S., Pupacdi B., Kwon S. M., et al. (2017). Common Molecular Subtypes Among Asian Hepatocellular Carcinoma and Cholangiocarcinoma. Cancer Cell 32, 57–70. 10.1016/j.ccell.2017.05.009 - DOI - PMC - PubMed
1. Chen R., Yang L., Goodison S., Sun Y. (2019). Deep-Learning Approach to Identifying Cancer Subtypes Using High-Dimensional Genomic Data. Bioinformatics 36, 1476–1483. 10.1093/bioinformatics/btz769 - DOI - PMC - PubMed
1. Gao F., Wang W., Tan M., Zhu L., Zhang Y., Fessler E., et al. (2019). Deepcc: A Novel Deep Learning-Based Framework for Cancer Molecular Subtype Classification. Oncogenesis 8, 1–12. 10.1038/s41389-019-0157-8 - DOI - PMC - PubMed

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Molecular Subtyping of Cancer Based on Distinguishing Co-Expression Modules and Machine Learning

Affiliations

Molecular Subtyping of Cancer Based on Distinguishing Co-Expression Modules and Machine Learning

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

LinkOut - more resources

Full Text Sources