Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 May 2:13:866005.
doi: 10.3389/fgene.2022.866005. eCollection 2022.

Molecular Subtyping of Cancer Based on Distinguishing Co-Expression Modules and Machine Learning

Affiliations

Molecular Subtyping of Cancer Based on Distinguishing Co-Expression Modules and Machine Learning

Peishuo Sun et al. Front Genet. .

Abstract

Molecular subtyping of cancer is recognized as a critical and challenging step towards individualized therapy. Most existing computational methods solve this problem via multi-classification of gene-expressions of cancer samples. Although these methods, especially deep learning, perform well in data classification, they usually require large amounts of data for model training and have limitations in interpretability. Besides, as cancer is a complex systemic disease, the phenotypic difference between cancer samples can hardly be fully understood by only analyzing single molecules, and differential expression-based molecular subtyping methods are reportedly not conserved. To address the above issues, we present here a new framework for molecular subtyping of cancer through identifying a robust specific co-expression module for each subtype of cancer, generating network features for each sample by perturbing correlation levels of specific edges, and then training a deep neural network for multi-class classification. When applied to breast cancer (BRCA) and stomach adenocarcinoma (STAD) molecular subtyping, it has superior classification performance over existing methods. In addition to improving classification performance, we consider the specific co-expressed modules selected for subtyping to be biologically meaningful, which potentially offers new insight for diagnostic biomarker design, mechanistic studies of cancer, and individualized treatment plan selection.

Keywords: machine learning; molecular subtyping of cancer; multi-classification; network perturbation; specific co-expression module.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

FIGURE 1
FIGURE 1
(A) The workflow from data processing to specific edges identification. Take four-subclass classification as an example. Each subtype is represented as a gene expression matrix with n genes after data processing. WGCNA is used to divide whole gene set into different co-expression modules. The specific edges of one subtype are extracted from the specific module of their subtype. The perturbation of these specific edges (gene pairs) is used to generate network features data. (B) Detailed process of generating one piece of network feature data. The perturbation values of a sample are the difference of specific edges between expanded network and the reference network.
FIGURE 2
FIGURE 2
Sufficient network feature data generation for model training and prediction. One reference sample set consists of T groups of samples that from T subtype (T: total number of subclass). Network feature data corresponding to training samples are used for model training.
FIGURE 3
FIGURE 3
Cancer subtyping performance by seven methods: our method SCM-DNN,HSIC-LASSO, ANOVA, Chi-square mutual information, SCP and DeepCC (A) BRCA subtyping and (B) STAD subtyping with using top100 and 200 distinguishing co-expressed gene pairs.
FIGURE 4
FIGURE 4
Venn diagram for overlaps among top 100 (network) features obtained by SCM-DNN, HSIC-LASSO, ANOVA, Chi-square and mutual information in (A) BRCA and (B) STAD.

Similar articles

Cited by

References

    1. Anglani R., Creanza T. M., Liuzzi V. C., Piepoli A., Panza A., Andriulli A., et al. (2014). Loss of Connectivity in Cancer Co-Expression Networks. PLoS ONE 9, e87075. 10.1371/journal.pone.0087075 - DOI - PMC - PubMed
    1. Cascianelli S., Molineris I., Isella C., Masseroli M., Medico E. (2020). Machine Learning for Rna Sequencing-Based Intrinsic Subtyping of Breast Cancer. Sci. Rep. 10, 14071. 10.1038/s41598-020-70832-2 - DOI - PMC - PubMed
    1. Chaisaingmongkol J., Budhu A., Dang H., Rabibhadana S., Pupacdi B., Kwon S. M., et al. (2017). Common Molecular Subtypes Among Asian Hepatocellular Carcinoma and Cholangiocarcinoma. Cancer Cell 32, 57–70. 10.1016/j.ccell.2017.05.009 - DOI - PMC - PubMed
    1. Chen R., Yang L., Goodison S., Sun Y. (2019). Deep-Learning Approach to Identifying Cancer Subtypes Using High-Dimensional Genomic Data. Bioinformatics 36, 1476–1483. 10.1093/bioinformatics/btz769 - DOI - PMC - PubMed
    1. Gao F., Wang W., Tan M., Zhu L., Zhang Y., Fessler E., et al. (2019). Deepcc: A Novel Deep Learning-Based Framework for Cancer Molecular Subtype Classification. Oncogenesis 8, 1–12. 10.1038/s41389-019-0157-8 - DOI - PMC - PubMed

LinkOut - more resources