This is a preprint.
The probability of edge existence due to node degree: a baseline for network-based predictions
- PMID: 36711569
- PMCID: PMC9881952
- DOI: 10.1101/2023.01.05.522939
The probability of edge existence due to node degree: a baseline for network-based predictions
Update in
-
The probability of edge existence due to node degree: a baseline for network-based predictions.Gigascience. 2024 Jan 2;13:giae001. doi: 10.1093/gigascience/giae001. Gigascience. 2024. PMID: 38323677 Free PMC article.
Abstract
Important tasks in biomedical discovery such as predicting gene functions, gene-disease associations, and drug repurposing opportunities are often framed as network edge prediction. The number of edges connecting to a node, termed degree, can vary greatly across nodes in real biomedical networks, and the distribution of degrees varies between networks. If degree strongly influences edge prediction, then imbalance or bias in the distribution of degrees could lead to nonspecific or misleading predictions. We introduce a network permutation framework to quantify the effects of node degree on edge prediction. Our framework decomposes performance into the proportions attributable to degree and the network's specific connections. We discover that performance attributable to factors other than degree is often only a small portion of overall performance. Degree's predictive performance diminishes when the networks used for training and testing-despite measuring the same biological relationships-were generated using distinct techniques and hence have large differences in degree distribution. We introduce the permutation-derived edge prior as the probability that an edge exists based only on degree. The edge prior shows excellent discrimination and calibration for 20 biomedical networks (16 bipartite, 3 undirected, 1 directed), with AUROCs frequently exceeding 0.85. Researchers seeking to predict new or missing edges in biological networks should use the edge prior as a baseline to identify the fraction of performance that is nonspecific because of degree. We released our methods as an open-source Python package (https://github.com/hetio/xswap/).
Figures








References
-
- Williams Richard J, Biology, Methodology or Chance? The Degree Distributions of Bipartite Ecological Networks, PLoS ONE (2011-03-03) https://doi.org/fmtk6x, DOI: 10.1371/journal.pone.0017645 - DOI - PMC - PubMed
-
- Kelly William P, Ingram Piers J, Stumpf Michael PH, The Degree Distribution of Networks: Statistical Model Selection, Bacterial Molecular Networks (2011-10-28) https://doi.org/ddx5rx, DOI: 10.1007/978-1-61779-361-5_13 - DOI - PubMed
-
- Broido Anna D, Clauset Aaron, Scale-free networks are rare, Nature Communications (2019-03-04) https://doi.org/gfztz9, DOI: 10.1038/s41467-019-08746-5 - DOI - PMC - PubMed
-
- Barabási Albert-László, Albert Réka, Emergence of Scaling in Random Networks, Science (1999-10-15) https://doi.org/ccsmnz, DOI: 10.1126/science.286.5439.509 - DOI - PubMed
-
- Himmelstein Daniel Scott, Lizee Antoine, Hessler Christine, Brueggeman Leo, Chen Sabrina L, Hadley Dexter, Green Ari, Khankhanian Pouya, Baranzini Sergio E, Systematic integration of biomedical knowledge prioritizes drugs for repurposing, eLife (2017-09-22) https://doi.org/cdfk, DOI: 10.7554/elife.26726 - DOI - PMC - PubMed
Publication types
Grants and funding
LinkOut - more resources
Full Text Sources