Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2007 Jan 24:8:22.
doi: 10.1186/1471-2105-8-22.

Gene network interconnectedness and the generalized topological overlap measure

Affiliations

Gene network interconnectedness and the generalized topological overlap measure

Andy M Yip et al. BMC Bioinformatics. .

Abstract

Background: Network methods are increasingly used to represent the interactions of genes and/or proteins. Genes or proteins that are directly linked may have a similar biological function or may be part of the same biological pathway. Since the information on the connection (adjacency) between 2 nodes may be noisy or incomplete, it can be desirable to consider alternative measures of pairwise interconnectedness. Here we study a class of measures that are proportional to the number of neighbors that a pair of nodes share in common. For example, the topological overlap measure by Ravasz et al. 1 can be interpreted as a measure of agreement between the m = 1 step neighborhoods of 2 nodes. Several studies have shown that two proteins having a higher topological overlap are more likely to belong to the same functional class than proteins having a lower topological overlap. Here we address the question whether a measure of topological overlap based on higher-order neighborhoods could give rise to a more robust and sensitive measure of interconnectedness.

Results: We generalize the topological overlap measure from m = 1 step neighborhoods to m > or = 2 step neighborhoods. This allows us to define the m-th order generalized topological overlap measure (GTOM) by (i) counting the number of m-step neighbors that a pair of nodes share and (ii) normalizing it to take a value between 0 and 1. Using theoretical arguments, a yeast co-expression network application, and a fly protein network application, we illustrate the usefulness of the proposed measure for module detection and gene neighborhood analysis.

Conclusion: Topological overlap can serve as an important filter to counter the effects of spurious or missing connections between network nodes. The m-th order topological overlap measure allows one to trade-off sensitivity versus specificity when it comes to defining pairwise interconnectedness and network modules.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Proportions of essential proteins among the S proteins that are most highly interconnected with a given essential hub protein in the Drosophila protein-protein interaction network. For a given neighborhood size S (x-axis) we averaged the results over 30 essential hub proteins (y-axis). The black horizontal line (GTOM0) represents the average proportion of essential proteins among the directly linked neighbors (adjacency = 1) of an essential hub protein.
Figure 2
Figure 2
Yeast network modules and protein biosynthesis genes for different GTOMm. A. The adjacency matrix (GTOM0). B. Standard Ravasz et al.'s TOM (GTOM1). C. Our new generalized TOM (GTOM2). In each column, the top row shows the dendrogram obtained by applying the average linkage hierarchical clustering to the corresponding GTOM dissimilarity, the middle row shows the color bar ordered by the corresponding dendrogram but colored by the module assignment with respect to the TOM measure in B, the bottom shows the color bar ordered by the corresponding dendrogram but colored in dark red if the gene belongs to the class 'protein biosynthesis'. The modules defined by the TOM are the branches of the dendrogram in B at the cutoff 0.95. Almost all protein biosynthesis genes are grouped together by the proposed new TOM measure whereas the other two measures tend to distribute the class over two modules. The modules defined by GTOM2 are more pronounced in the sense that they are separated by larger distances.
Figure 3
Figure 3
Pair-wise scatter plots between different GTOMm dissimilarity measures. The upper triangular panel shows the scatter plots, the lower triangular panel shows the corresponding Pearson correlation coefficients, the diagonal panel shows the frequency distributions of the dissimilarities. Correlation-based dissimilarities dC,[p] are denoted by dissCorp. GTOM-based dissimilarities dT,[m] are denoted by dissGTOMm. Note that dissGTOM0 (= 1 - ADJ) takes on binary values for the unweighted network.
Figure 4
Figure 4
Topological overlap matrix plots for the yeast gene co-expression network. A. GTOM0 plot. B. GTOM1 plot. C. GTOM2 plot. D. GTOM3 plot. The color bar on the top of each heatmap shows the module assignment obtained from GTOM1. The color bar on the left of each heatmap shows the functional category of the corresponding genes. Dark red indicates the membership to the class 'protein biosynthesis'. Modules are more pronounced in the GTOM2 and GTOM3 plots (larger contrast between the diagonal blocks and off-diagonal blocks). Smaller modules (as diagonal blocks of red) are more visible in GTOM0 and GTOM1 plots whereas larger modules are more respected in GTOM2 and GTOM3 plots. However, GTOM3 leads to excessively large modules and thus the specificity of the modules is compromised. Protein biosynthesis genes are grouped together in the GTOM2 and GTOM3 plots.
Figure 5
Figure 5
Multi-dimensional scaling plots of the yeast gene co-expression network. MDS plots using A. GTOM0, B. GTOM1, C. GTOM2, and D. GTOM3. The coloring scheme is used to reflect the 7 modules shown in Figure 2B detected by using hierarchical clustering with the GTOM1-based dissimilarity. The symbol '▲' denotes genes that belong to the functional category 'protein biosynthesis'. Genes that belong to other classes are denoted by a '○'. In general, the module assignment is preserved across the different GTOM measures. But the spatial distributions of the points vary to a large extent. Genes in the 'protein biosynthesis' class appear to be closer together.
Figure 6
Figure 6
Separation of protein biosynthesis genes from non-protein biosynthesis genes in perturbed versions of the yeast network. The average separation (c.f. Eq. 6) is reported for GTOM0 (red), GTOM1 (green), GTOM2 (blue) and GTOM3 (brown). To assess the robustness of the GTOM measures to random deletions, we randomly deleted a proportion p of connections (adjacencies) and averaged the results across 20 draws. Note that GTOM2 outperforms the other measures if p < 67%. GTOM3 outperforms GTOM2 if more than 67% of adjacencies are deleted. This illustrates that high values of m can counter the effect of misspecified (unknown or missing) adjacencies.
Figure 7
Figure 7
A simple example where GTOM2 is superior to GTOM1. GTOM neighborhood of size S = 7 around node 1. A. GTOM1 neighbors are colored in black. B. GTOM2 neighbors are colored in black. Note that GTOM2 detects the 'true' neighborhood (comprised on nodes 1 through 8) while GTOM1 misses nodes 6 and 7.

References

    1. Ravasz E, Somera AL, Mongru DA, Oltvai ZN, Barabasi AL. Hierarchical organization of modularity in metabolic networks. Science. 2002;297:1551–1555. doi: 10.1126/science.1073374. - DOI - PubMed
    1. Ye Y, Godzik A. Comparative Analysis of Protein Domain Organization. Genome Biology. 2004;14:343–353. - PMC - PubMed
    1. Carlson MR, Zhang B, Fang Z, Horvath S, Mishel PS, Nelson SF. Gene Connectivity, Function, and Sequence Conservation: Predictions from Modular Yeast Co-expression Networks. BMC Genomics. 2006;7 - PMC - PubMed
    1. Horvath S, Zhang B, Carlson M, Lu K, Zhu S, Felciano R, Laurance M, Zhao W, Shu Q, Lee Y, Scheck A, Liau L, Wu H, Geschwind D, Febbo P, Kornblum H, Cloughesy T, Nelson S, Mischel P. Analysis of Oncogenic Signaling Networks in Glioblastoma Identifies ASPM as a Novel Molecular Target. Proc Natl Acad Sci USA. 2006;103:17402–17407. doi: 10.1073/pnas.0608396103. - DOI - PMC - PubMed
    1. Oldham MC, Horvath S, Geschwind DH. Conservation and evolution of gene coexpression networks in human and chimpanzee brains. Proc Natl Acad Sci U S A. 2006;103:17973–17978. doi: 10.1073/pnas.0605938103. - DOI - PMC - PubMed

Publication types

Substances

LinkOut - more resources