. 2024 Mar 27;25(3):bbae185.

doi: 10.1093/bib/bbae185.

DeepKEGG: a multi-omics data integration framework with biological insights for cancer recurrence prediction and biomarker discovery

Wei Lan¹, Haibo Liao¹, Qingfeng Chen¹, Lingzhi Zhu², Yi Pan³, Yi-Ping Phoebe Chen⁴

Affiliations

¹ Guangxi Key Laboratory of Multimedia Communications and Network Technology, School of Computer, Electronic and Information, Guangxi University, No. 100 Daxue Road, Xixiangtang District, Nanning 530004, China.
² School of Computer and Information Science, Hunan Institute of Technology, No. 18 Henghua Road, Zhuhui District， Hengyang 421002, China.
³ School of Computer Science and Control Engineering, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, No. 1068 Xueyuan Avenue, Shenzhen University Town, Nanshan District, Shenzhen 518055, China.
⁴ Department of Computer Science and Information Technology, La Trobe University, Plenty Rd, Bundoora, Melbourne, Victoria 3086, Australia.

PMID: 38678587
PMCID: PMC11056029
DOI: 10.1093/bib/bbae185

DeepKEGG: a multi-omics data integration framework with biological insights for cancer recurrence prediction and biomarker discovery

Wei Lan et al. Brief Bioinform. 2024.

. 2024 Mar 27;25(3):bbae185.

doi: 10.1093/bib/bbae185.

Authors

Wei Lan¹, Haibo Liao¹, Qingfeng Chen¹, Lingzhi Zhu², Yi Pan³, Yi-Ping Phoebe Chen⁴

Affiliations

¹ Guangxi Key Laboratory of Multimedia Communications and Network Technology, School of Computer, Electronic and Information, Guangxi University, No. 100 Daxue Road, Xixiangtang District, Nanning 530004, China.
² School of Computer and Information Science, Hunan Institute of Technology, No. 18 Henghua Road, Zhuhui District， Hengyang 421002, China.
³ School of Computer Science and Control Engineering, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, No. 1068 Xueyuan Avenue, Shenzhen University Town, Nanshan District, Shenzhen 518055, China.
⁴ Department of Computer Science and Information Technology, La Trobe University, Plenty Rd, Bundoora, Melbourne, Victoria 3086, Australia.

PMID: 38678587
PMCID: PMC11056029
DOI: 10.1093/bib/bbae185

Abstract

Deep learning-based multi-omics data integration methods have the capability to reveal the mechanisms of cancer development, discover cancer biomarkers and identify pathogenic targets. However, current methods ignore the potential correlations between samples in integrating multi-omics data. In addition, providing accurate biological explanations still poses significant challenges due to the complexity of deep learning models. Therefore, there is an urgent need for a deep learning-based multi-omics integration method to explore the potential correlations between samples and provide model interpretability. Herein, we propose a novel interpretable multi-omics data integration method (DeepKEGG) for cancer recurrence prediction and biomarker discovery. In DeepKEGG, a biological hierarchical module is designed for local connections of neuron nodes and model interpretability based on the biological relationship between genes/miRNAs and pathways. In addition, a pathway self-attention module is constructed to explore the correlation between different samples and generate the potential pathway feature representation for enhancing the prediction performance of the model. Lastly, an attribution-based feature importance calculation method is utilized to discover biomarkers related to cancer recurrence and provide a biological interpretation of the model. Experimental results demonstrate that DeepKEGG outperforms other state-of-the-art methods in 5-fold cross validation. Furthermore, case studies also indicate that DeepKEGG serves as an effective tool for biomarker discovery. The code is available at https://github.com/lanbiolab/DeepKEGG.

Keywords: cancer recurrence prediction; interpretability of deep learning; multi-omics data integration; self-attention mechanism.

PubMed Disclaimer

Figures

**Figure 1**
The flowchart of DeepKEGG. (A) Biological hierarchical module. (B) Pathway self-attention module. (C) Classification module. (D) Model interpretability module.

**Figure 2**
The performance comparison of SGD, LR, DT, SVM, KNN, PathCNN, MOGONET and DeepKEGG in term of AUC and AUPR values on four datasets of TCGA. Performance metrics reported here include: AUC (A) and AUPR (B).

**Figure 3**
Performance comparison of DeepKEGG on multi-omics data and single-omics data. (A) Results of the BLCA dataset. (B) Results of the PRAD dataset. (C) Results of the TARGET-AML dataset. (D) Results of the TARGET-WT dataset.

**Figure 4**
Performance of DeepKEGG under different values of hyper-parameter k. Performance metrics reported here include: AUC (A) and AUPR (B).

**Figure 5**
Kaplan–Meier curves of mRNA genes for LIHC. (A) Kaplan-Meier curve of the *CD8A* gene. (B) Kaplan-Meier curve of the *CDC25B* gene. (C) Kaplan-Meier curve of the *ACACA* gene. (D) Kaplan-Meier curve of the *PIK3R2* gene. (E) Kaplan-Meier curve of the *IPMK* gene. (F) Kaplan-Meier curve of the *EFNA3* gene.

**Figure 6**
Kaplan–Meier curves of mRNA genes for BLCA. (A) Kaplan-Meier curve of the *CD3D* gene. (B) Kaplan-Meier curve of the *COL1A1* gene. (C) Kaplan-Meier curve of the *COL6A3* gene. (D) Kaplan-Meier curve of the *STXBP1* gene. (E) Kaplan-Meier curve of the *FBP1* gene. (F) Kaplan-Meier curve of the *FN1* gene.

**Figure 7**
Visualization of node importance for LIHC recurrence. (A) Visualization of node importance of mRNA data from LIHC. (B) Visualization of node importance of SNV data from LIHC. (C) Visualization of node importance of miRNA data from LIHC. This figure shows the pathway nodes and their gene sets that have a large contribution in liver cancer recurrence. For each subgraph from top to bottom, the first layer is the gene/miRNA nodes, which show the seven most important genes/miRNAs in each pathway, while the residual genes/miRNAs are labeled ‘Residual’. The second layer is the pathway nodes, which show the top 12 pathways, while the residual pathways are labeled ‘Residual’. For all nodes, the leftmost node is the node with the highest order of importance, and then its importance decreases from left to right.

**Figure 8**
Visualization of node importance for BLCA recurrence. (A) Visualization of node importance of mRNA data from BLCA. (B) Visualization of node importance of SNV data from BLCA. (C) Visualization of node importance of miRNA data from BLCA. This figure shows the pathway nodes and their gene sets that have a large contribution in bladder cancer recurrence. For each subgraph from top to bottom, the first layer is the gene/miRNA nodes, which show the seven most important genes/miRNAs in each pathway, while the residual genes/miRNAs are labeled ‘Residual’. The second layer is the pathway nodes, which show the top 12 pathways, while the residual pathways are labeled ‘Residual’. For all nodes, the leftmost node is the node with the highest order of importance, and then its importance decreases from left to right.

See this image and copyright information in PMC

Cited by

PathX-CNN: An Enhanced Explainable Convolutional Neural Network for Survival Prediction and Pathway Analysis in Glioblastoma.
Sobhan M, Islam MM, Mondal AM. Sobhan M, et al. bioRxiv [Preprint]. 2025 Jan 27:2025.01.24.634827. doi: 10.1101/2025.01.24.634827. bioRxiv. 2025. PMID: 39975150 Free PMC article. Preprint.
fuseMLR: an R package for integrative prediction modeling of multi-omics data.
Fouodo CJK, Bleskina M, Szymczak S. Fouodo CJK, et al. BMC Bioinformatics. 2025 Aug 26;26(1):221. doi: 10.1186/s12859-025-06248-4. BMC Bioinformatics. 2025. PMID: 40859122 Free PMC article.
Deciphering the molecular heterogeneity of intermediate- and (very-)high-risk non-muscle-invasive bladder cancer using multi-layered -omics studies.
Akand M, Jatsenko T, Muilwijk T, Gevaert T, Joniau S, Van der Aa F. Akand M, et al. Front Oncol. 2024 Oct 21;14:1424293. doi: 10.3389/fonc.2024.1424293. eCollection 2024. Front Oncol. 2024. PMID: 39497708 Free PMC article.
Entropy measures for quantifying complexity in digital pathology and spatial omics.
Li X, Ren X, Venugopal R. Li X, et al. iScience. 2025 May 28;28(6):112765. doi: 10.1016/j.isci.2025.112765. eCollection 2025 Jun 20. iScience. 2025. PMID: 40546955 Free PMC article. Review.
DGHNN: a deep graph and hypergraph neural network for pan-cancer related gene prediction.
Li B, Xiao X, Zhang C, Xiao M, Zhang L. Li B, et al. Bioinformatics. 2025 Jul 1;41(7):btaf379. doi: 10.1093/bioinformatics/btaf379. Bioinformatics. 2025. PMID: 40580449 Free PMC article.

See all "Cited by" articles

References

1. Lan W, Wang J, Li M, et al. Computational approaches for prioritizing candidate disease genes based on PPI networks. Tsinghua Sci Technol 2015;20(5):500–12.
1. Menyhárt O, Győrffy B. Multi-omics approaches in cancer research with applications in tumor subtyping, prognosis, and diagnosis. Comput Struct Biotechnol J 2021;19:949–60. - PMC - PubMed
1. Lan W, Dong Y, Chen Q, et al. KGANCDA: predicting circRNA-disease associations based on knowledge graph attention network. Brief Bioinform 2022;23(1):bbab494. - PubMed
1. Subramanian I, Verma S, Kumar S, et al. Multi-omics data integration, interpretation, and its application. Bioinform Biol Insights 2020;14:1177932219899051. - PMC - PubMed
1. Weinstein JN, Collisson EA, Mills GB, et al. The cancer genome atlas pan-cancer analysis project. Nat Genet 2013;45(45):1113–20. - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

DeepKEGG: a multi-omics data integration framework with biological insights for cancer recurrence prediction and biomarker discovery

Affiliations

DeepKEGG: a multi-omics data integration framework with biological insights for cancer recurrence prediction and biomarker discovery

Authors

Affiliations

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Related information

Grants and funding

LinkOut - more resources

Full Text Sources