Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Mar 14;27(3):304.
doi: 10.3390/e27030304.

Edge-Centric Embeddings of Digraphs: Properties and Stability Under Sparsification

Affiliations

Edge-Centric Embeddings of Digraphs: Properties and Stability Under Sparsification

Ahmed Begga et al. Entropy (Basel). .

Abstract

In this paper, we define and characterize the embedding of edges and higher-order entities in directed graphs (digraphs) and relate these embeddings to those of nodes. Our edge-centric approach consists of the following: (a) Embedding line digraphs (or their iterated versions); (b) Exploiting the rank properties of these embeddings to show that edge/path similarity can be posed as a linear combination of node similarities; (c) Solving scalability issues through digraph sparsification; (d) Evaluating the performance of these embeddings for classification and clustering. We commence by identifying the motive behind the need for edge-centric approaches. Then we proceed to introduce all the elements of the approach, and finally, we validate it. Our edge-centric embedding entails a top-down mining of links, instead of inferring them from the similarities of node embeddings. This analysis is key to discovering inter-subgraph links that hold the whole graph connected, i.e., central edges. Using directed graphs (digraphs) allows us to cluster edge-like hubs and authorities. In addition, since directed edges inherit their labels from destination (origin) nodes, their embedding provides a proxy representation for node classification and clustering as well. This representation is obtained by embedding the line digraph of the original one. The line digraph provides nice formal properties with respect to the original graph; in particular, it produces more entropic latent spaces. With these properties at hand, we can relate edge embeddings to node embeddings. The main contribution of this paper is to set and prove the linearity theorem, which poses each element of the transition matrix for an edge embedding as a linear combination of the elements of the transition matrix for the node embedding. As a result, the rank preservation property explains why embedding the line digraph and using the labels of the destination nodes provides better classification and clustering performances than embedding the nodes of the original graph. In other words, we do not only facilitate edge mining but enforce node classification and clustering. However, computing the line digraph is challenging, and a sparsification strategy is implemented for the sake of scalability. Our experimental results show that the line digraph representation of the sparsified input graph is quite stable as we increase the sparsification level, and also that it outperforms the original (node-centric) representation. For the sake of simplicity, our theorem relies on node2vec-like (factorization) embeddings. However, we also include several experiments showing how line digraphs may improve the performance of Graph Neural Networks (GNNs), also following the principle of maximum entropy.

Keywords: digraph sparsification; edge embedding; edge mining; graph neural networks; line digraph; maximum entropy.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of the data; in the writing of the manuscript; or in the decision to publish the results.

Figures

Figure 1
Figure 1
The Long-Tail Effect. Histogram of occurrence of each node in the random walks for Cora (top), Citeseer (middle) and Wiki (bottom) datasets, by using the original graph (left), the line graph (center), and the iterated line graph (right). y axis is represented in log scale.
Figure 2
Figure 2
Examples of (a,d) original digraphs, (b,e) their line digraphs, and (c,f) their iterated line digraphs. The color of the nodes represents two different labels. Herein, edges are labeled using the label of the destination node. In other application domains may be more convenient to use the label of the source.
Figure 3
Figure 3
Explanation of the Linearity Theorem. (Top): Input digraph G and its line digraph G. Colors index nodes as well as edges destinations. Example: compute PG4(e10,e9), where e10 is a hub and e9 is an authority. (Middle): Adjacencies AG vs. AG. Clearly, rank(AG)=n, with n=6 nodes in this case. Eout retains the n different rows of AG. Ein is the indicator matrix where Ein(i,j)=1 means that ei belongs to the class of equivalence j=1,,n. Then, EoutEin is a permutation of AG (see colors to uncover the permutation matrix Q making AG=QTEoutEinQ). (Bottom): Explicit linear combinations. PGr as a function of PGr1 (taking Q into account). PG4(e10,e9) results from the aggregation of 3 paths in G, all of them starting from e10=V1 and ending at any node ekV in 3 steps. Two of the paths reach V6=ek=6 (and also V5=ek=9) and another one reachs V5=ek=9. Since only e6 is linked with e9, we only retain the scores of the paths reaching V6=ek=6, i.e., PG4(e10,e9)=13+19=49.
Figure 4
Figure 4
Classification stability. (Top): directed single-label citation networks. Legend: node2vec node refers to performance for the nodal embedding, node2vec edge refers to performance for edge embedding, and similarly for NetFM and HOPE. (Bottom): undirected multi-label networks. (Top): Micro-F1 score. (Bottom): Macro-F1 score.
Figure 5
Figure 5
Clustering stability: Evolution of ARI (top) vs. evolution of modularity (bottom) for different sparsification levels (fraction of preserved edges). Legend: same as in classification results.
Figure 6
Figure 6
Visualization of learned edge embeddings using t-SNE dimensionality reduction of the Cora dataset. Colors represent different ground truth classes. Note that even with significant sparsification (80% of edges removed), the embedding space maintains clear community structures and distinct class boundaries, demonstrating the entropy-preserving properties of line digraphs even under aggressive edge reduction.
Figure 7
Figure 7
Comparison of node and edge sparsification strategies across GNN architectures on the Cora dataset. Solid lines represent node sparsification, dashed lines represent edge sparsification. The x-axis indicates the fraction of edges preserved (from 0.0 to 1.0), while the y-axis shows classification accuracy. Edge sparsification consistently outperforms node sparsification, with GraphSAGE (dashed green) showing remarkable stability even with minimal edge retention.
Figure 8
Figure 8
Comparison of node and edge sparsification strategies across GNN architectures on the Citeseer dataset. Solid lines represent node sparsification, dashed lines represent edge sparsification. The performance gap between edge and node sparsification is more pronounced in Citeseer than in Cora, with edge-based methods (dashed lines) maintaining consistently higher accuracy across all sparsification levels. This suggests that preserving edges is particularly important for maintaining the structural information in citation networks with sparse connectivity patterns.
Figure 9
Figure 9
Comparison of node and edge sparsification strategies across GNN architectures on the Pubmed dataset. Solid lines represent node sparsification, dashed lines represent edge sparsification. All GNN architectures show remarkable stability under edge sparsification (dashed lines), with GraphSAGE maintaining above 90% accuracy even with only 10% of edges preserved. This demonstrates that strategic edge preservation captures the essential topology of the graph, aligning with our theoretical findings about the maximal entropy properties of line digraphs.

References

    1. Kipf T.N., Welling M. Semi-Supervised Classification with Graph Convolutional Networks; Proceedings of the International Conference on Learning Representations (ICLR); Toulon, France. 24–26 April 2017.
    1. Veličković P., Cucurull G., Casanova A., Romero A., Liò P., Bengio Y. Graph Attention Networks; Proceedings of the International Conference on Learning Representations; Vancouver, BC, Canada. 30 April–3 May 2018.
    1. Klicpera J., Bojchevski A., Günnemann S. Personalized Embedding Propagation: Combining Neural Networks on Graphs with Personalized PageRank. arXiv. 20181810.05997
    1. Hamilton W., Ying Z., Leskovec J. Inductive representation learning on large graphs. Adv. Neural Inf. Process. Syst. 2017;30:1025–1035.
    1. Waikhom L., Patgiri R. A survey of graph neural networks in various learning paradigms: Methods, applications, and challenges. Artif. Intell. Rev. 2023;56:6295–6364. doi: 10.1007/s10462-022-10321-2. - DOI

LinkOut - more resources