Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Feb 1;36(3):865-871.
doi: 10.1093/bioinformatics/btz652.

Gene relevance based on multiple evidences in complex networks

Affiliations

Gene relevance based on multiple evidences in complex networks

Noemi Di Nanni et al. Bioinformatics. .

Abstract

Motivation: Multi-omics approaches offer the opportunity to reconstruct a more complete picture of the molecular events associated with human diseases, but pose challenges in data analysis. Network-based methods for the analysis of multi-omics leverage the complex web of macromolecular interactions occurring within cells to extract significant patterns of molecular alterations. Existing network-based approaches typically address specific combinations of omics and are limited in terms of the number of layers that can be jointly analysed. In this study, we investigate the application of network diffusion to quantify gene relevance on the basis of multiple evidences (layers).

Results: We introduce a gene score (mND) that quantifies the relevance of a gene in a biological process taking into account the network proximity of the gene and its first neighbours to other altered genes. We show that mND has a better performance over existing methods in finding altered genes in network proximity in one or more layers. We also report good performances in recovering known cancer genes. The pipeline described in this article is broadly applicable, because it can handle different types of inputs: in addition to multi-omics datasets, datasets that are stratified in many classes (e.g., cell clusters emerging from single cell analyses) or a combination of the two scenarios.

Availability and implementation: The R package 'mND' is available at URL: https://www.itb.cnr.it/mnd.

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Flowchart of the analysis pipeline with mND. (1) Network-diffusion is applied to the original dataset, composed of multiple layers L1, L2, …, Ln (e.g. different types of omics or multiple samples of same omic type); (2) identification of the top k neighbours for each gene in each layer; (3) calculation of mND score; (4) empirical P-value assessment; (5) classification of genes across layers
Fig. 2.
Fig. 2.
Performance in ranking high scoring genes in network proximity. (A) Example of a gene module with its high scoring genes (H, black) in each of the two layers and the resulting mND score; only genes belonging to the module and links occurring among such genes are reported. (B) Recall values for 10 signal permutations for each of the nine modules (P1, P2, …, P9), using mND score and other methods; the number between parentheses after module id is module size. (C) Recall values, shown separately for high scoring genes and other genes in each module. (D) Recall values normalized by the highest recall found for each input configuration at varying number of neighbours (k). (A–D) These results were obtained using interactome GH
Fig. 3.
Fig. 3.
Performance in recovering known cancer genes. Partial AUC (pAUC) at varying number of top false positive ranking genes (n) in the analysis of mutations and expression changes in four cancer types. (AD) These results were generated using interactome WU
Fig. 4.
Fig. 4.
Analysis of mutations and expression changes in BC. (A) mND score and empirical P-value; the red dashed line indicates the top 123 genes (subplot); colours and shapes have the same meaning of panel B. (B) Gene diffusion scores of the top 123 genes ranked by mND. (C) tp values (Equation 6) for the two layers. (D) Gene network composed of the top 123 genes ranked by mND; colours and shapes have the same meaning as in panel B. (E) Classification of genes across layers (only the top 75 ranked genes are shown for clarity); brown: isolated; orange: linker; purple: module; grey: not significant. (A–D) Layer 1 (L1): mutations; Layer 2 (L2): expression variations. H1, H2: sets of genes with high initial scores in respectively L1 and L2. NS: not significant, genes not belonging to H1 and H2. Green rhombuses: genes belonging to H1 and H2; blue triangles: genes belonging only to H1; yellow rectangles: genes belonging only to H2; red shapes: genes neither in H1 nor in H2. These results were generated using interactome STRING
Fig. 5.
Fig. 5.
Pathways enriched in mutated genes and/or differentially expressed genes in BC. Number of genes found by mND and single omics analyses (L1*, L2* and L2) in each pathway at varying number of top ranking genes considered (horizontal axis, n); L1: mutations; L2: gene expression variations; the asterisk distinguishes between gene ranking by original data and the corresponding network diffusion scores. (A) Disease pathways; (B) other pathways. (A and B) Pathways from KEGG database

References

    1. Ahmad A., Fröhlich H. (2016) Integrating heterogeneous omics data via statistical inference and learning techniques. Genomics Comput. Biol., 2, 32.
    1. Barabasi A.L. et al. (2011) Network medicine: a network-based approach to human disease. Nat. Rev. Genet., 12, 56–68. - PMC - PubMed
    1. Bersanelli M. et al. (2016a) Methods for the integration of multi-omics data: mathematical aspects. BMC Bioinformatics, 17 (Suppl 2): 15. - PMC - PubMed
    1. Bersanelli M. et al. (2016b) Network diffusion-based analysis of high-throughput data for the detection of differentially enriched modules. Sci. Rep., 6, 34841. - PMC - PubMed
    1. Brown G.R. et al. (2015) Gene: a gene-centered information resource at NCBI. Nucleic Acids Res., 43, D36–D42. - PMC - PubMed

Publication types