Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Jan 17;14(1):186.
doi: 10.1038/s41467-022-35181-w.

Finding shortest and nearly shortest path nodes in large substantially incomplete networks by hyperbolic mapping

Affiliations

Finding shortest and nearly shortest path nodes in large substantially incomplete networks by hyperbolic mapping

Maksim Kitsak et al. Nat Commun. .

Abstract

Dynamic processes on networks, be it information transfer in the Internet, contagious spreading in a social network, or neural signaling, take place along shortest or nearly shortest paths. Computing shortest paths is a straightforward task when the network of interest is fully known, and there are a plethora of computational algorithms for this purpose. Unfortunately, our maps of most large networks are substantially incomplete due to either the highly dynamic nature of networks, or high cost of network measurements, or both, rendering traditional path finding methods inefficient. We find that shortest paths in large real networks, such as the network of protein-protein interactions and the Internet at the autonomous system level, are not random but are organized according to latent-geometric rules. If nodes of these networks are mapped to points in latent hyperbolic spaces, shortest paths in them align along geodesic curves connecting endpoint nodes. We find that this alignment is sufficiently strong to allow for the identification of shortest path nodes even in the case of substantially incomplete networks, where numbers of missing links exceed those of observable links. We demonstrate the utility of latent-geometric path finding in problems of cellular pathway reconstruction and communication security.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Latent geometry uncovers shortest and nearly shortest paths in the Internet at the Autonomous System level.
a Hyperbolic map of the Internet at the Autonomous System (AS) level. See Section SV for data collection and hyperbolic mapping details. The latent space is the 2-dimensional hyperbolic disk and each point corresponds to an Autonomous System. Yellow squares and red circles highlight ASes corresponding to communication paths between AS5392-AS8875 and AS1224-AS11650 pairs. Shown are the nodes with path relevance exceeding 0.1, see Methods. Sizes of squares and circles are larger for nodes with higher path relevance. b Schematic distance from point C to geodesic γ(A, B) drawn between points A and B. c The distribution of distances to the γ(AS5392, AS8875) geodesic from (light blue) shortest path nodes, (dark blue) nodes with path relevance larger than 0.05, and (red) all Internet nodes. d Path relevance as a function of distance to the γ(AS5392, AS8875) geodesic for all ASes.
Fig. 2
Fig. 2. The accuracy of nearly shortest path finding in the AS Internet.
a The average inverse network-based distance between the Autonomous System (AS) node pairs as a function of the number of removed nodes. The average inverse distance for each data point is estimated over 1,000 randomly selected node pairs. The average inverse distance serves as the measure of the collateral damage to the network: the smaller is the value, the larger are the network-based distances between nodes. We consider three node removal strategies: (i) nodes with the largest degree, (ii) nodes with the smallest hyperbolic distance to γ(AS5392, AS8875) geodesic, and (iii) nodes with the smallest network-based distances to the AS5392-AS8875 pair. The largest degree strategy is the most invasive, resulting in a nearly two-fold decrease of the average inverse network-based distance upon removing 100 largest degree nodes. This result is consistent with earlier fundings on attack tolerance of complex networks. b The accuracy of distance to geodesic in finding nearly shortest path nodes in the incomplete AS Internet network. The distance to geodesic strategy is juxtaposed against the network-based strategy and random walk-based strategies, see Methods. We evaluate the average precision scores and the average number of node removals needed to disrupt node pairs of interest. Each data bar is the average over 1000 randomly selected node pairs, and error bars display one standard deviation. Note that the relative performance of the distance to geodesic increases as the fraction of missing links increases.
Fig. 3
Fig. 3. The accuracy of nearly shortest path finding in synthetic networks incomplete synthetic networks.
Synthetic networks are constructed as Random Hyperbolic Graphs (RHGs) of N = 5, 000 nodes, average degree 〈k〉 = 10, and variable degree distribution, λ ∈ (2, 3), and geometricity T ∈ (0, 1) parameters. After the construction of each network, we remove each of its links with probability p = 0.5. The heatmaps consist of 9 × 9 = 81 points, each point corresponding to the average of 100 randomly selected node pairs separated by the Δθ=π8 angle in H2. For technical details and other angles, see Section SVIII. Panel a displays precision scores, while panel b displays the number of node removals needed to disconnect all paths between the node pair of interest. Note that the path-finding accuracy is nearly independent of degree distribution. The highest path-finding accuracy is achieved at temperature T = 0, when the geometricity of RHGs is the strongest and decreases as T increases. Marked on panels a, b are inferred parameters of real networks. These networks are the Internet at the autonomous system (AS) level (λ = 2.1, T = 0.7) similarity protein-protein interaction (PPI) network (λ = 2.1, T = 0.4), and the pretty-good-privacy (PGP) web of trust PGP web of trust (λ = 2.1, T = 0.6) that are studied in this work, the network of human metabolic interactions (Human metabolism, λ = 2.55, T = 0.65), Ref. , and the network of protein-protein interactions (original PPI, λ = 2.65, T = 0.7), Ref. .
Fig. 4
Fig. 4. Anomaly detection in interdomain Internet routing.
a Interdomain routing at a glance. Shown is toy network of the Internet at the Autonomous System (AS) level, where nodes are autonomous systems and links are data transfer agreements. Shown on the right-hand side is the distributed calculations of paths to AS8. The generation of paths on the left-hand side is an example of routing instability. AS1 falsely claims the direct connection to AS8. This information is propagated by the border-gateway-protocol (BGP) first to AS4 and then to AS7. As a result, AS4 and AS7 have fake path information to AS8. Routing paths following the hijack of the Google AS prefixes by the MainOne AS, see Section SV. b An example of a routing path traversing Nigerian Internet Service Provider MainOne (AS37282) announced before the prefix hijack. Green circles are ASes constituting the path; the solid red line is the hyperbolic geodesic connecting the origin-destination pair. Distance to geodesic for every node constituting the routing path is shown above the hyperbolic map. The hyperbolic stretch of the BGP path, Eq. (1), is the largest normalized distance to geodesic from routing path nodes, DΩ(A, B) = 0.23. c An example of a routing path announced during the prefix hijack of MainOne AS. MainOne announced a direct connection to Google AS (AS15169), leading to a large-scale cascade of false BGP path announcements. Shown is one of these false BGP paths originating at AS29140, traversing the MainOne AS (AS37282) and ending at the Google AS (AS15169). Distance to geodesic for every node constituting the routing path is shown above the hyperbolic map. The hyperbolic stretch of the routing path is DΩ(A, B)=0.74, indicating that the BGP path is less conformal to the latent-geometric geodesic, compared to the one in panel b. Red squares are ASes constituting the path; the solid red line is the hyperbolic geodesic connecting the origin-destination pair. d PDFs of hyperbolic stretches for BGP paths announced (blue) before and (red) during the MainOne AS prefix hijack. BGP paths announced before the hijack are characterized by smaller hyperbolic stretch values than those announced during the hijack.
Fig. 5
Fig. 5. Latent-geometric localization of the ubiquitin-proteasome pathway (UPP) pathway.
a the hyperbolic map of the similarity-based human protein-protein interaction (PPI) network and the UPP pathway. Pathway proteins are colored according to their functional groups, background nodes are non-pathway proteins comprising the network. E1, E2, and E3 are the three classes of enzymes associated with ubiquitination: E1 and E2 correspond to ubiquitin-activating and ubiquitin-conjugating enzymes, respectively, while E3 correspond to ubiquitin-protein ligases. The solid line displays the hyperbolic geodesic fitting the pathway proteins. Panel b depicts 6 clusters of proteins in the latent-geometric vicinity of the UPP pathway geodesic. Proteins are colored based on the number of clusters they belong to. Black squares depict UPP pathway proteins. See also Fig. S7, which depicts each of the 6 clusters separately.
Fig. 6
Fig. 6. Noise tolerance and comparison to other embedding methods.
We analyze the tolerance of distance to geodesic to node coordinate uncertancies. We consider the hyperbolic map of the incomplete network of the Internet at the Austonomous System (AS) level at the q = 0.5 missing link rate. We plot the precision of distance to geodesic in finding nearly shortest path nodes as a function of synthetic noise magnitude a, which we add to learned node coordinates: θ^i=θi+a2πXi, r^i=ri+aRYi, {Xi, Yi} ← U(0, 1). We compare the precision of distance to geodesic to the network-based method and two popular Euclidean embedding algorithms node2vec and DeepWalk of different dimensionality D. Each simulation is the average over 1000 randomly chosen node pairs. In cases of node2vec and DeepWalk, we rank nodes based on their distance to Euclidean geodesic drawn between the path endpoints. For each Euclidean embedding algorithm, the precision increases as a dimensionality D increases. Note that both node2vec and DeepWalk are less accurate than the network-based metric.

References

    1. Goyal P, Ferrara E. Graph embedding techniques, applications, and performance: a survey. Knowledge-Based Syst. 2018;151:78–94. doi: 10.1016/j.knosys.2018.03.022. - DOI
    1. Cui P, Wang X, Pei J, Zhu W. A survey on network embedding. IEEE Trans. Knowl. Data Eng. 2019;31:833–852. doi: 10.1109/TKDE.2018.2849727. - DOI
    1. Boguñá M, et al. Network geometry. Nat. Rev. Phys. 2021;3:114–135. doi: 10.1038/s42254-020-00264-4. - DOI
    1. Alanis-Lobato G, Mier P, Andrade-Navarro M. The latent geometry of the human protein interaction Network. Bioinformatics. 2018;34:2826–2834. doi: 10.1093/bioinformatics/bty206. - DOI - PMC - PubMed
    1. Boguñá M, Papadopoulos F, Krioukov D. Sustaining the internet with hyperbolic mapping. Nat. Commun. 2010;1:62. doi: 10.1038/ncomms1063. - DOI - PubMed

Publication types