Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Oct 12:6:34841.
doi: 10.1038/srep34841.

Network diffusion-based analysis of high-throughput data for the detection of differentially enriched modules

Affiliations

Network diffusion-based analysis of high-throughput data for the detection of differentially enriched modules

Matteo Bersanelli et al. Sci Rep. .

Abstract

A relation exists between network proximity of molecular entities in interaction networks, functional similarity and association with diseases. The identification of network regions associated with biological functions and pathologies is a major goal in systems biology. We describe a network diffusion-based pipeline for the interpretation of different types of omics in the context of molecular interaction networks. We introduce the network smoothing index, a network-based quantity that allows to jointly quantify the amount of omics information in genes and in their network neighbourhood, using network diffusion to define network proximity. The approach is applicable to both descriptive and inferential statistics calculated on omics data. We also show that network resampling, applied to gene lists ranked by quantities derived from the network smoothing index, indicates the presence of significantly connected genes. As a proof of principle, we identified gene modules enriched in somatic mutations and transcriptional variations observed in samples of prostate adenocarcinoma (PRAD). In line with the local hypothesis, network smoothing index and network resampling underlined the existence of a connected component of genes harbouring molecular alterations in PRAD.

PubMed Disclaimer

Figures

Figure 1
Figure 1. Network diffusion-based analysis of omics for the identification of differentially enriched network regions.
(a) Statistics (descriptive on the left, inferential on the right) of molecular profiles are smoothed by means of network propagation and the network smoothing index is computed. (b) Identification of significantly connected components among genes ranked by ΔS or Sp, and network-based functional analysis.
Figure 2
Figure 2. Performance of differential network smoothing index in simulated datasets containing gene modules enriched in omics information.
(a) Somatic gene mutation relative frequencies ranked in decreasing order to underline mountains and hills. (b) Heatmap of recall values with varying percentage of mountain genes m% and average hill frequency h, calculated on a sample of modules of size M = 100. (c) Fraction of recalled genes for different values of parameter ε with varying ω. (d) Average recall obtained on several toy datasets of different sizes M = {100, 150, 200} and topological density d, for different values of h. (e) Comparison between recall values obtained ranking genes by S or f, for different values of ω, on several toy datasets. (f) Number of links and number of connected genes among the top ranking genes ranked by f or S. (a–f) Simulations were run using STRING PPIs. (c–f) The signal ω was computed with m% = 0.1.
Figure 3
Figure 3. Identification of significantly connected genes with network resampling p values in simulated datasets containing a gene module enriched in omics information.
Network resampling p values (pnr) calculated for each rank of gene lists ordered by decreasing values of ΔS in datasets containing gene modules of different size (red lines, M) and signal (ω). Yellow lines indicate the smallest ranks associated with the presence of significantly connected components. Simulations were run using STRING PPIs.
Figure 4
Figure 4. Comparison of network-based and network-free quantities calculated on somatic mutation and gene expression data from PRAD samples associated with two different prognostic groups.
(a,b) Scatter plot with network-based (y–axis) vs network-free (x–axis) gene scores calculated on PRAD SM (a) and GE (b) data; colours indicate the top 500 genes ranked by network-free (red) or network-based (yellow, blue) scores and the overlaps (brown, purple). (c,d) Number of links (y–axis, left) and number of connected genes (y–axis, right) within the first 500 genes ordered by network-based (ΔS, Sp) and network-free (Δf, lfcp) gene scores, calculated on PRAD SM (c) and PRAD GE (d) data. (a–d) ΔS and Sp were calculated using STRING PPIs and, respectively, ε = 0.25 and ε = 1. (c,d) #Number of links (vertical axis, left) or number of veritces (vertical axis, right).
Figure 5
Figure 5. Gene modules enriched in genes with different somatic mutations and gene expression levels between two PRAD prognostic groups.
(a) pnr value of gene lists ranked by ΔS (SM, yellow) and Sp (GE, blue); vertical lines indicate the top ranking genes selected to be part of the corresponding gene modules. (b) Network of genes belonging to SM module (yellow), GE module (blue) or both (green); square/circle = the gene is/is not ranked by network-free statistics within the first M positions (M = module size); vertex size = the larger the size the higher the gene score (maximum between ΔS and Sp); pink border = genes that occur in at least 10 articles on PRAD (Supplementary Data S1–2). These results were obtained with STRING PPIs.
Figure 6
Figure 6. Network of pathways enriched in genes with different somatic mutations and gene expression levels between two PRAD prognostic groups.
Vertices are pathways with p < 0.003 (GSEA, estimated with permutations) in at least one interactome and links indicate the similarity between pathways (o ≥ 0.95); communities of similar pathways are underlined by pink background and identified by numbers (Supplementary Data S3); pathways that are not similar to any other pathway are not shown; green = pathway found in SM and GE data; yellow = SM only; blue = GE only; circle = pathway found only when using network-based quantities (ΔS or Sp); triangle = pathway found only when using network-free quantities (Δf or lfcp); square = pathway found by network-based quantities and network-free statistics (Supplementary Data S3).

References

    1. Hartwell L. H., Hopfield J. J., Leibler S. & Murray A. W. From molecular to modular cell biology. Nature 402, C47–C52 (1999). - PubMed
    1. Barabási A. L., Gulbahce N. & Loscalzo J. Network medicine: a network-based approach to human disease. Nat. Rev. Genet. 12, 56–68 (2011). - PMC - PubMed
    1. Sharan R., Ulitsky I. & Shamir R. Network-based prediction of protein function. Mol. Syst. Bio. 3, 88 (2007). - PMC - PubMed
    1. Wang X., Gulbahce N. & Yu H. Network-based methods for human disease gene prediction. Brief. Funct. Genomics 10, 280–293 (2011). - PubMed
    1. Castellani G. C. et al. Systems medicine of inflammaging. Brief. Bioinform. 17, 527–540 (2015). - PMC - PubMed

Publication types

LinkOut - more resources