. 2025 Mar 14;27(3):304.

doi: 10.3390/e27030304.

Edge-Centric Embeddings of Digraphs: Properties and Stability Under Sparsification

Ahmed Begga¹, Francisco Escolano Ruiz¹, Miguel Ángel Lozano¹

Affiliations

PMID: 40149228
PMCID: PMC11941605
DOI: 10.3390/e27030304

Edge-Centric Embeddings of Digraphs: Properties and Stability Under Sparsification

Ahmed Begga et al. Entropy (Basel). 2025.

. 2025 Mar 14;27(3):304.

doi: 10.3390/e27030304.

Authors

Ahmed Begga¹, Francisco Escolano Ruiz¹, Miguel Ángel Lozano¹

Affiliation

¹ Department of Computer Science and Artificial Intelligence, University of Alicante, 03690 Alicante, Spain.

PMID: 40149228
PMCID: PMC11941605
DOI: 10.3390/e27030304

Abstract

In this paper, we define and characterize the embedding of edges and higher-order entities in directed graphs (digraphs) and relate these embeddings to those of nodes. Our edge-centric approach consists of the following: (a) Embedding line digraphs (or their iterated versions); (b) Exploiting the rank properties of these embeddings to show that edge/path similarity can be posed as a linear combination of node similarities; (c) Solving scalability issues through digraph sparsification; (d) Evaluating the performance of these embeddings for classification and clustering. We commence by identifying the motive behind the need for edge-centric approaches. Then we proceed to introduce all the elements of the approach, and finally, we validate it. Our edge-centric embedding entails a top-down mining of links, instead of inferring them from the similarities of node embeddings. This analysis is key to discovering inter-subgraph links that hold the whole graph connected, i.e., central edges. Using directed graphs (digraphs) allows us to cluster edge-like hubs and authorities. In addition, since directed edges inherit their labels from destination (origin) nodes, their embedding provides a proxy representation for node classification and clustering as well. This representation is obtained by embedding the line digraph of the original one. The line digraph provides nice formal properties with respect to the original graph; in particular, it produces more entropic latent spaces. With these properties at hand, we can relate edge embeddings to node embeddings. The main contribution of this paper is to set and prove the linearity theorem, which poses each element of the transition matrix for an edge embedding as a linear combination of the elements of the transition matrix for the node embedding. As a result, the rank preservation property explains why embedding the line digraph and using the labels of the destination nodes provides better classification and clustering performances than embedding the nodes of the original graph. In other words, we do not only facilitate edge mining but enforce node classification and clustering. However, computing the line digraph is challenging, and a sparsification strategy is implemented for the sake of scalability. Our experimental results show that the line digraph representation of the sparsified input graph is quite stable as we increase the sparsification level, and also that it outperforms the original (node-centric) representation. For the sake of simplicity, our theorem relies on node2vec-like (factorization) embeddings. However, we also include several experiments showing how line digraphs may improve the performance of Graph Neural Networks (GNNs), also following the principle of maximum entropy.

Keywords: digraph sparsification; edge embedding; edge mining; graph neural networks; line digraph; maximum entropy.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of the data; in the writing of the manuscript; or in the decision to publish the results.

Figures

**Figure 1**
The Long-Tail Effect. Histogram of occurrence of each node in the random walks for Cora (**top**), Citeseer (**middle**) and Wiki (**bottom**) datasets, by using the original graph (**left**), the line graph (**center**), and the iterated line graph (**right**). y axis is represented in log scale.

**Figure 2**
Examples of (a,d) original digraphs, (b,e) their line digraphs, and (c,f) their iterated line digraphs. The color of the nodes represents two different labels. Herein, edges are labeled using the label of the destination node. In other application domains may be more convenient to use the label of the source.

**Figure 3**
Explanation of the Linearity Theorem. (**Top**): Input digraph G and its line digraph $ℓ G$ . Colors index nodes as well as edges destinations. *Example*: compute $P_{ℓ G}^{4} (e_{10}, e_{9})$ , where $e_{10}$ is a hub and $e_{9}$ is an authority. (**Middle**): Adjacencies $A_{G}$ vs. $A_{ℓ G}$ . Clearly, $r a n k (A_{ℓ G}) = n$ , with $n = 6$ nodes in this case. $E_{o u t}$ retains the n different rows of $A_{ℓ G}$ . $E_{i n}$ is the indicator matrix where $E_{i n} (i, j) = 1$ means that $e_{i}$ belongs to the class of equivalence $j = 1, \dots, n$ . Then, $E_{o u t} E_{i n}$ is a permutation of $A_{G}$ (see colors to uncover the permutation matrix $Q$ making $A_{G} = Q^{T} E_{o u t} E_{i n} Q$ ). (**Bottom**): Explicit linear combinations. $P_{ℓ G}^{r}$ as a function of $P_{G}^{r - 1}$ (taking $Q$ into account). $P_{ℓ G}^{4} (e_{10}, e_{9})$ results from the aggregation of 3 paths in G, all of them starting from $e_{10}^{-} = V_{1}$ and ending at any node $e_{k}^{-} \in V$ in 3 steps. Two of the paths reach $V_{6} = e_{k = 6}^{-}$ (and also $V_{5} = e_{k = 9}^{-}$ ) and another one reachs $V_{5} = e_{k = 9}^{-}$ . Since only $e_{6}$ is linked with $e_{9}$ , we only retain the scores of the paths reaching $V_{6} = e_{k = 6}^{-}$ , i.e., $P_{ℓ G}^{4} (e_{10}, e_{9}) = \frac{1}{3} + \frac{1}{9} = \frac{4}{9}$ .

**Figure 4**
Classification stability. (**Top**): directed single-label citation networks. Legend: node2vec node refers to performance for the nodal embedding, node2vec edge refers to performance for edge embedding, and similarly for NetFM and HOPE. (**Bottom**): undirected multi-label networks. (**Top**): Micro-F1 score. (**Bottom**): Macro-F1 score.

**Figure 5**
Clustering stability: Evolution of ARI (**top**) vs. evolution of modularity (**bottom**) for different sparsification levels (fraction of preserved edges). Legend: same as in classification results.

**Figure 6**
Visualization of learned edge embeddings using t-SNE dimensionality reduction of the Cora dataset. Colors represent different ground truth classes. Note that even with significant sparsification (80% of edges removed), the embedding space maintains clear community structures and distinct class boundaries, demonstrating the entropy-preserving properties of line digraphs even under aggressive edge reduction.

**Figure 7**
Comparison of node and edge sparsification strategies across GNN architectures on the Cora dataset. Solid lines represent node sparsification, dashed lines represent edge sparsification. The x-axis indicates the fraction of edges preserved (from 0.0 to 1.0), while the y-axis shows classification accuracy. Edge sparsification consistently outperforms node sparsification, with GraphSAGE (dashed green) showing remarkable stability even with minimal edge retention.

**Figure 8**
Comparison of node and edge sparsification strategies across GNN architectures on the Citeseer dataset. Solid lines represent node sparsification, dashed lines represent edge sparsification. The performance gap between edge and node sparsification is more pronounced in Citeseer than in Cora, with edge-based methods (dashed lines) maintaining consistently higher accuracy across all sparsification levels. This suggests that preserving edges is particularly important for maintaining the structural information in citation networks with sparse connectivity patterns.

**Figure 9**
Comparison of node and edge sparsification strategies across GNN architectures on the Pubmed dataset. Solid lines represent node sparsification, dashed lines represent edge sparsification. All GNN architectures show remarkable stability under edge sparsification (dashed lines), with GraphSAGE maintaining above 90% accuracy even with only 10% of edges preserved. This demonstrates that strategic edge preservation captures the essential topology of the graph, aligning with our theoretical findings about the maximal entropy properties of line digraphs.

See this image and copyright information in PMC

References

1. Kipf T.N., Welling M. Semi-Supervised Classification with Graph Convolutional Networks; Proceedings of the International Conference on Learning Representations (ICLR); Toulon, France. 24–26 April 2017.
1. Veličković P., Cucurull G., Casanova A., Romero A., Liò P., Bengio Y. Graph Attention Networks; Proceedings of the International Conference on Learning Representations; Vancouver, BC, Canada. 30 April–3 May 2018.
1. Klicpera J., Bojchevski A., Günnemann S. Personalized Embedding Propagation: Combining Neural Networks on Graphs with Personalized PageRank. arXiv. 20181810.05997
1. Hamilton W., Ying Z., Leskovec J. Inductive representation learning on large graphs. Adv. Neural Inf. Process. Syst. 2017;30:1025–1035.
1. Waikhom L., Patgiri R. A survey of graph neural networks in various learning paradigms: Methods, applications, and challenges. Artif. Intell. Rev. 2023;56:6295–6364. doi: 10.1007/s10462-022-10321-2. - DOI

Grants and funding

PID2022-142516OB-I00/Spanish Government

LinkOut - more resources

Full Text Sources
- MDPI
- PubMed Central
Research Materials
- NCI CPTC Antibody Characterization Program

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Edge-Centric Embeddings of Digraphs: Properties and Stability Under Sparsification

Affiliation

Edge-Centric Embeddings of Digraphs: Properties and Stability Under Sparsification

Authors

Affiliation

Abstract

Conflict of interest statement

Figures

References

Grants and funding

LinkOut - more resources

Full Text Sources

Research Materials