Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Nov 1;39(11):btad643.
doi: 10.1093/bioinformatics/btad643.

Explainable Multilayer Graph Neural Network for cancer gene prediction

Affiliations

Explainable Multilayer Graph Neural Network for cancer gene prediction

Michail Chatzianastasis et al. Bioinformatics. .

Abstract

Motivation: The identification of cancer genes is a critical yet challenging problem in cancer genomics research. Existing computational methods, including deep graph neural networks, fail to exploit the multilayered gene-gene interactions or provide limited explanations for their predictions. These methods are restricted to a single biological network, which cannot capture the full complexity of tumorigenesis. Models trained on different biological networks often yield different and even opposite cancer gene predictions, hindering their trustworthy adaptation. Here, we introduce an Explainable Multilayer Graph Neural Network (EMGNN) approach to identify cancer genes by leveraging multiple gene-gene interaction networks and pan-cancer multi-omics data. Unlike conventional graph learning on a single biological network, EMGNN uses a multilayered graph neural network to learn from multiple biological networks for accurate cancer gene prediction.

Results: Our method consistently outperforms all existing methods, with an average 7.15% improvement in area under the precision-recall curve over the current state-of-the-art method. Importantly, EMGNN integrated multiple graphs to prioritize newly predicted cancer genes with conflicting predictions from single biological networks. For each prediction, EMGNN provided valuable biological insights via both model-level feature importance explanations and molecular-level gene set enrichment analysis. Overall, EMGNN offers a powerful new paradigm of graph learning through modeling the multilayered topological gene relationships and provides a valuable tool for cancer genomics research.

Availability and implementation: Our code is publicly available at https://github.com/zhanglab-aim/EMGNN.

PubMed Disclaimer

Conflict of interest statement

None declared.

Figures

Figure 1.
Figure 1.
An illustration of our proposed EMGNN approach. The model consists of three main steps: (i) apply a shared GNN to update the node representation matrix of each input graph; (ii) construct a meta graph for each gene, where the same genes across all graphs are connected to a meta node, and update the representation of the meta nodes with a second GNN, Meta GNN; and (iii) use a multilayer perceptron to predict the class of each meta node.
Figure 2.
Figure 2.
Test AUPRC and standard deviation values of EMGNN(GCN) with respect to the number of input PPI networks. Each line represents a test set of positive and negative labeled genes held out in a specific PPI network. The addition of PPI networks was conducted using a random sampling approach, where three combinations of PPI networks were sampled randomly at each point. Note that the testing nodes remain the same as more networks are added. We observe that the performance increased for the majority of the test datasets, as the number of input networks increases.
Figure 3.
Figure 3.
Explanation of each PPI network’s contribution to cancer gene predictions. (A) Representative PPI network contributions in known cancer genes and newly predicted cancer genes. TP53 and BRCA1 are known cancer genes; COL5A1 and MSLN are newly predicted cancer genes. (B) Overall distribution of meta-edge feature importance for all known cancer genes across six PPI networks. Meta-edge feature importance was normalized to one (see Section 2 for details). (C) A hypothetical illustration of PPI network cancer neighborhood implicated in the variation of meta-edge importance. (D) Empirical analysis demonstrates a higher correlation between meta-edge importance and cancer neighborhood for genes with a large meta-edge variance.
Figure 4.
Figure 4.
Explanations of multi-omic node feature importance in cancer gene predictions. (A) Overall distribution of node feature importance grouped by omic feature types, including single-nucleotide variants (MF), DNA methylation (METH), gene expression (GE), and copy number aberrations (CNA), for known cancer genes. (B) Detailed node feature importance for the four genes analysed in Fig. 3B. X-axis labels were color-coded to match the omic feature types in (A). Individual tumor types were coded according to TCGA study abbreviations (Weinstein et al. 2013).
Figure 5.
Figure 5.
EMGNN predicts COL5A1 as a novel cancer gene and reveals biological insights. (A) A comparison of predicted cancer gene probability from EMGNN and EMOGI models trained on single PPI networks. As a probability of 50% equaled random guessing between cancer versus non-cancer gene, the bar heights reflected the prediction confidence. (B) Three cancer hallmark genesets were significantly enriched in the important neighboring genes of COL5A1 as revealed by interpreting EMGNN model. (C) Enrichment of apical junction cancer hallmark geneset in COL5A1 neighboring genes. The neighboring genes of COL5A1 were ranked by their EMGNN node importance on the x-axis, with each bar representing a gene in the apical junction geneset. A strong left-shifted curve demonstrates enrichment of apical junction geneset in the top important genes to predict COL5A1 as a cancer gene.

References

    1. Almeida LO, Custódio AC, Pinto GR. et al. Polymorphisms and DNA methylation of gene TP53 associated with extra-axial brain tumors. Genet Mol Res 2009;8:8–18. - PubMed
    1. Berardini TZ, Mundodi S, Reiser L. et al. Functional annotation of the Arabidopsis genome using controlled vocabularies. Plant Physiol 2004;135:745–55. - PMC - PubMed
    1. Brückner A, Polge C, Lentze N. et al. Yeast two-hybrid, a powerful tool for systems biology. Int J Mol Sci 2009;10:2763–88. - PMC - PubMed
    1. Cao M, Pietras CM, Feng X. et al. New directions for diffusion-based network prediction of protein function: incorporating pathways with confidence. Bioinformatics 2014;30:i219–27. - PMC - PubMed
    1. Chang H-J, Yang U-C, Lai M-Y. et al. High BRCA1 gene expression increases the risk of early distant metastasis in ER+ breast cancers. Sci Rep 2022;12:77. - PMC - PubMed

Publication types