Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Aug;31(8):726-33.
doi: 10.1038/nbt.2635. Epub 2013 Jul 14.

Network deconvolution as a general method to distinguish direct dependencies in networks

Affiliations

Network deconvolution as a general method to distinguish direct dependencies in networks

Soheil Feizi et al. Nat Biotechnol. 2013 Aug.

Erratum in

Abstract

Recognizing direct relationships between variables connected in a network is a pervasive problem in biological, social and information sciences as correlation-based networks contain numerous indirect relationships. Here we present a general method for inferring direct effects from an observed correlation matrix containing both direct and indirect effects. We formulate the problem as the inverse of network convolution, and introduce an algorithm that removes the combined effect of all indirect paths of arbitrary length in a closed-form solution by exploiting eigen-decomposition and infinite-series sums. We demonstrate the effectiveness of our approach in several network applications: distinguishing direct targets in gene expression regulatory networks; recognizing directly interacting amino-acid residues for protein structure prediction from sequence alignments; and distinguishing strong collaborations in co-authorship social networks using connectivity information alone. In addition to its theoretical impact as a foundational graph theoretic tool, our results suggest network deconvolution is widely applicable for computing direct dependencies in network science across diverse disciplines.

PubMed Disclaimer

Figures

Figure 1
Figure 1. Network deconvolution overview
a. Direct edges in a network (solid blue arrows) can lead to indirect relationships (dashed red arrows) as a result of transitive information flow. These indirect contributions can be of length two (e.g. 1→2→3), three (e.g. 1→2→3→5) or higher, and can combine both direct and indirect effects (e.g. 2→4), and multiple indirect effects along varying paths (e.g. 2→3→5, 2→4→5). Self-loops are excluded from networks. Network deconvolution seeks to reverse the effect of transitive information flow across all indirect paths, in order to recover the true direct network (blue edges, Gdir) based on the observed network (combined blue and red edges, Gobs). b. Algebraically, the transitive closure of a network can be expressed as an infinite sum of the true direct network and all indirect effects along paths of increasing lengths, which can be written in a closed form as an infinite-series sum. Network deconvolution exploits this closed form to express the direct network Gdir as a function of the observed network Gobs. c. To efficiently compute this inverse operation, we express both the true and observed networks Gdir and Gobs by decomposition into their eigenvectors and eigenvalues, which enables each eigenvalue λi dir of the original network to be expressed as a nonlinear function of a single corresponding eigenvalue λi obs of the convolved observed network. d,e. Network deconvolution assumes that indirect flow weights can be approximated as the product of direct edge weights and that observed edge weights are the sum of direct and indirect flows. When these assumptions hold (d), network deconvolution removes all indirect flow effects and infers all direct interactions and weights exactly. Even when these assumptions do not hold (e), ND infers 87% of direct edges, showing robustness to non-linear effects.
Figure 2
Figure 2. Deconvolution of gene regulatory networks
a. Network Deconvolution (ND) applied to the inferred networks of top-scoring methods from DREAM5 leads to consistent improvements for mutual information (MI) and correlation based methods (average performance increase 59%). ND also improves other top-scoring methods (11% on average), including the best performing method of the DREAM5 challenge (GENIE3), thus leading to a new overall highest performance. Moreover, the community network obtained by integrating network predictions from individual methods (1–9) before ND is outperformed by the community network based on deconvolved networks by ∼22%. b. Network motif analysis showing the relative performance of inference methods for cascades (casc.) and feed-forward loops (FFL) before and after ND. Red/blue corresponds to increased/decreased prediction accuracy of the two motif types relative to the overall performance of the method before ND (measured by the area under the ROC curve, AUROC; Supplementary Note S2.4). The original methods (before ND, left side) have different relative performances for cascades and FFLs, e.g., mutual information-based network inference (MI) tends to include feed-forward edges (red arrow), resulting in higher accuracy for FFLs but lower accuracy for cascades, while the opposite is true for the Inferelator and ANOVerence. The deconvolved networks (after ND, right side) show significantly higher accuracy for true cascade network motifs for all methods, and moderately improved accuracy for FFLs on average, showing that ND successfully eliminates spurious indirect feed-forward edges for true cascade motifs, without sacrificing accuracy for true FFLs.
Figure 3
Figure 3. Application to protein structure prediction
a. Applying ND to predict experimentally-determined residue contacts (gray dots) based on amino-acid sequence alignments on fifteen proteins in different folding classes with sizes ranging from 50 to 260 residues in human. We applied ND to networks derived by mutual information (MI, lower left triangles) and direct information (DI, upper-right triangles). Arrows highlight distinct residue interactions captured by each method, highlighting the improvement over both MI and DI. b. Cumulative distributions of graph weights for interacting (solid lines) and non-interacting (dashed lines) amino acid pairs, for both MI (blue) and ND (red). Network deconvolution assigns higher weights to true positive edges and lower weights to false negatives, leading to 5-fold higher discrimination between true contacts and indirect mutual information for the 10% of edges with highest scores
Figure 4
Figure 4. Application to co-authorship social network
a. Use of network deconvolution to distinguishing strong ties (red) and weak ties (green) in the largest connected component of a co-authorship network containing 379 authors. True collaboration strengths were computed by summing the number of co-authored papers and down-weighting each paper by the number of additional co-authors. ND only had access to unweighted co-authorship edges, but exploiting transitive relationships to weigh down weak ties resulting in 77% accurate predictions (solid lines) and only 23% inaccurate predictions (dashed lines), demonstrating that this information lies within the network edges, and that ND is well-suited for discovering it. b. Beyond the binary classification of strong and weak ties, we found a strong correlation (R2=0.76) across all 2,742 edges connecting 1,589 authors, between the weights assigned by ND (y-axis) and the true collaboration strengths (x-axis) obtained using additional publication details.

Comment in

  • Network cleanup.
    Alipanahi B, Frey BJ. Alipanahi B, et al. Nat Biotechnol. 2013 Aug;31(8):714-5. doi: 10.1038/nbt.2657. Nat Biotechnol. 2013. PMID: 23929347 No abstract available.

References

    1. De Smet R, Marchal K. Advantages and limitations of current network inference methods. Nature reviews Microbiology. 2010;8:717–29. - PubMed
    1. Newman MEJ. The structure and function of complex networks. SIAM review. 2003:167–256.
    1. Koetter R, Médard M. An algebraic approach to network coding. IEEE/ACM transactions on networking. 2003;11:782–795.
    1. Witten IH, Frank E, Hall MA. Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann; 2011.
    1. Marbach D, et al. Wisdom of crowds for robust gene network inference. Nature Methods. 2012 - PMC - PubMed

Publication types

LinkOut - more resources