Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Mar 27;25(3):bbae185.
doi: 10.1093/bib/bbae185.

DeepKEGG: a multi-omics data integration framework with biological insights for cancer recurrence prediction and biomarker discovery

Affiliations

DeepKEGG: a multi-omics data integration framework with biological insights for cancer recurrence prediction and biomarker discovery

Wei Lan et al. Brief Bioinform. .

Abstract

Deep learning-based multi-omics data integration methods have the capability to reveal the mechanisms of cancer development, discover cancer biomarkers and identify pathogenic targets. However, current methods ignore the potential correlations between samples in integrating multi-omics data. In addition, providing accurate biological explanations still poses significant challenges due to the complexity of deep learning models. Therefore, there is an urgent need for a deep learning-based multi-omics integration method to explore the potential correlations between samples and provide model interpretability. Herein, we propose a novel interpretable multi-omics data integration method (DeepKEGG) for cancer recurrence prediction and biomarker discovery. In DeepKEGG, a biological hierarchical module is designed for local connections of neuron nodes and model interpretability based on the biological relationship between genes/miRNAs and pathways. In addition, a pathway self-attention module is constructed to explore the correlation between different samples and generate the potential pathway feature representation for enhancing the prediction performance of the model. Lastly, an attribution-based feature importance calculation method is utilized to discover biomarkers related to cancer recurrence and provide a biological interpretation of the model. Experimental results demonstrate that DeepKEGG outperforms other state-of-the-art methods in 5-fold cross validation. Furthermore, case studies also indicate that DeepKEGG serves as an effective tool for biomarker discovery. The code is available at https://github.com/lanbiolab/DeepKEGG.

Keywords: cancer recurrence prediction; interpretability of deep learning; multi-omics data integration; self-attention mechanism.

PubMed Disclaimer

Figures

Figure 1
Figure 1
The flowchart of DeepKEGG. (A) Biological hierarchical module. (B) Pathway self-attention module. (C) Classification module. (D) Model interpretability module.
Figure 2
Figure 2
The performance comparison of SGD, LR, DT, SVM, KNN, PathCNN, MOGONET and DeepKEGG in term of AUC and AUPR values on four datasets of TCGA. Performance metrics reported here include: AUC (A) and AUPR (B).
Figure 3
Figure 3
Performance comparison of DeepKEGG on multi-omics data and single-omics data. (A) Results of the BLCA dataset. (B) Results of the PRAD dataset. (C) Results of the TARGET-AML dataset. (D) Results of the TARGET-WT dataset.
Figure 4
Figure 4
Performance of DeepKEGG under different values of hyper-parameter k. Performance metrics reported here include: AUC (A) and AUPR (B).
Figure 5
Figure 5
Kaplan–Meier curves of mRNA genes for LIHC. (A) Kaplan-Meier curve of the CD8A gene. (B) Kaplan-Meier curve of the CDC25B gene. (C) Kaplan-Meier curve of the ACACA gene. (D) Kaplan-Meier curve of the PIK3R2 gene. (E) Kaplan-Meier curve of the IPMK gene. (F) Kaplan-Meier curve of the EFNA3 gene.
Figure 6
Figure 6
Kaplan–Meier curves of mRNA genes for BLCA. (A) Kaplan-Meier curve of the CD3D gene. (B) Kaplan-Meier curve of the COL1A1 gene. (C) Kaplan-Meier curve of the COL6A3 gene. (D) Kaplan-Meier curve of the STXBP1 gene. (E) Kaplan-Meier curve of the FBP1 gene. (F) Kaplan-Meier curve of the FN1 gene.
Figure 7
Figure 7
Visualization of node importance for LIHC recurrence. (A) Visualization of node importance of mRNA data from LIHC. (B) Visualization of node importance of SNV data from LIHC. (C) Visualization of node importance of miRNA data from LIHC. This figure shows the pathway nodes and their gene sets that have a large contribution in liver cancer recurrence. For each subgraph from top to bottom, the first layer is the gene/miRNA nodes, which show the seven most important genes/miRNAs in each pathway, while the residual genes/miRNAs are labeled ‘Residual’. The second layer is the pathway nodes, which show the top 12 pathways, while the residual pathways are labeled ‘Residual’. For all nodes, the leftmost node is the node with the highest order of importance, and then its importance decreases from left to right.
Figure 8
Figure 8
Visualization of node importance for BLCA recurrence. (A) Visualization of node importance of mRNA data from BLCA. (B) Visualization of node importance of SNV data from BLCA. (C) Visualization of node importance of miRNA data from BLCA. This figure shows the pathway nodes and their gene sets that have a large contribution in bladder cancer recurrence. For each subgraph from top to bottom, the first layer is the gene/miRNA nodes, which show the seven most important genes/miRNAs in each pathway, while the residual genes/miRNAs are labeled ‘Residual’. The second layer is the pathway nodes, which show the top 12 pathways, while the residual pathways are labeled ‘Residual’. For all nodes, the leftmost node is the node with the highest order of importance, and then its importance decreases from left to right.

Similar articles

Cited by

References

    1. Lan W, Wang J, Li M, et al. Computational approaches for prioritizing candidate disease genes based on PPI networks. Tsinghua Sci Technol 2015;20(5):500–12.
    1. Menyhárt O, Győrffy B. Multi-omics approaches in cancer research with applications in tumor subtyping, prognosis, and diagnosis. Comput Struct Biotechnol J 2021;19:949–60. - PMC - PubMed
    1. Lan W, Dong Y, Chen Q, et al. KGANCDA: predicting circRNA-disease associations based on knowledge graph attention network. Brief Bioinform 2022;23(1):bbab494. - PubMed
    1. Subramanian I, Verma S, Kumar S, et al. Multi-omics data integration, interpretation, and its application. Bioinform Biol Insights 2020;14:1177932219899051. - PMC - PubMed
    1. Weinstein JN, Collisson EA, Mills GB, et al. The cancer genome atlas pan-cancer analysis project. Nat Genet 2013;45(45):1113–20. - PMC - PubMed

Publication types

Substances