. 2023 Aug 24;18(8):e0290090.

doi: 10.1371/journal.pone.0290090. eCollection 2023.

Starling: Introducing a mesoscopic scale with Confluence for Graph Clustering

Bruno Gaume¹

Affiliations

PMID: 37619240
PMCID: PMC10449208
DOI: 10.1371/journal.pone.0290090

Starling: Introducing a mesoscopic scale with Confluence for Graph Clustering

Bruno Gaume. PLoS One. 2023.

. 2023 Aug 24;18(8):e0290090.

doi: 10.1371/journal.pone.0290090. eCollection 2023.

Author

Bruno Gaume¹

Affiliation

¹ Centre National de la Recherche Scientifique, CLLE, ISCPIF, Toulouse, France.

PMID: 37619240
PMCID: PMC10449208
DOI: 10.1371/journal.pone.0290090

Abstract

Given a Graph G = (V, E) and two vertices i, j ∈ V, we introduce Confluence(G, i, j), a vertex mesoscopic closeness measure based on short Random walks, which brings together vertices from a same overconnected region of the Graph G, and separates vertices coming from two distinct overconnected regions. Confluence becomes a useful tool for defining a new Clustering quality function QConf(G, Γ) for a given Clustering Γ and for defining a new heuristic Starling to find a partitional Clustering of a Graph G intended to optimize the Clustering quality function QConf. We compare the accuracies of Starling, to the accuracies of three state of the art Graphs Clustering methods: Spectral-Clustering, Louvain, and Infomap. These comparisons are done, on the one hand with artificial Graphs (a) Random Graphs and (b) a classical Graphs Clustering Benchmark, and on the other hand with (c) Terrain-Graphs gathered from real data. We show that with (a), (b) and (c), Starling is always able to obtain equivalent or better accuracies than the three others methods. We show also that with the Benchmark (b), Starling is able to obtain equivalent accuracies and even sometimes better than an Oracle that would only know the expected overconnected regions from the Benchmark, ignoring the concretely constructed edges.

Copyright: © 2023 Bruno Gaume. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

**Fig 1. Optimal Clusterings for QPedge and Q_Conf on Gtoy1.**
If two vertices have same color, then they are in a same Module, with 〈**P, R, F**〉 where $P = P r e c i s i o n (P a i r s (Γ), E)$ , $R = R e c a l l (P a i r s (Γ), E)$ , $F = F s c o r e (P a i r s (Γ), E)$ .

**Fig 2. Homogeneity and Completeness: {x, y} ∈ E iff x and y have same color.**

**Fig 3. Binary classifiers of nodes pairs by nodes blocks.**

**Fig 4. Performance with Benchmark_ER.**
Each point (x, y) is the average over 100 Graphs with p = x.

**Fig 5. Performance with Benchmark_LFR (k = 15).**
Each point (x, y) is the average over 100 Graphs with μ = x. Fig 5(a)–5(c) are zooms on the *Fscore*s when the overconnected regions are less clear (i.e. when we can no longer trust $O r a c l e_{L F R} (G_{L F R}) = Γ_{G_{L F R}}$ ).

**Fig 6. Performance with Benchmark_LFR (k = 25).**
Each point (x, y) is the average over 100 Graphs with μ = x. Fig 6(a)–6(c) are zooms on the *Fscore*s when the overconnected regions are less clear (i.e. when we can no longer trust $O r a c l e_{L F R} (G_{L F R}) = Γ_{G_{L F R}}$ ).

**Fig 7. Performance of SGC(GEmail=(VGEmail,EGEmail), κ), κ varying.**
According to the intrinsic truth $E_{G_{E m a i l}}$ in Fig 7(a), and in Fig 7(b) according to the extrinsic truth $P a i r s (Γ_{D e p}) = \cup_{γ \in Γ_{D e p}} P_{2}^{γ}$ .

**Fig 8. Performance of SGC(G = (V,E), κ) according to the intrinsic truth E, κ varying.**

**Fig 9. Performance with Benchmark_ER.**
Each point (x, y) is the average over 100 Graphs with p = x.

**Fig 10. Performance with Benchmark_LFR (k = 15).**
Each point (x, y) is the average over 100 Graphs with μ = x. Fig 10(a)–10(c) are zooms on the *Fscore*s when the overconnected regions are less clear (i.e. when we can no longer trust $O r a c l e_{L F R} (G_{L F R}) = Γ_{G_{L F R}}$ ).

**Fig 11. Performance with Benchmark_LFR (k = 25).**
Each point (x, y) is the average over 100 Graphs with μ = x. Fig 11(a)–11(c) are zooms on the *Fscore*s when the overconnected regions are less clear (i.e. when we can no longer trust $O r a c l e_{L F R} (G_{L F R}) = Γ_{G_{L F R}}$ ).

**Fig 12. Optimal Clusterings for QConf0.0 with t = 3 and with t = 6: Shapes describe an optimal Clustering for QConf0.0 with t = 3, colors describe an optimal Clustering for QConf0.0 with t = 6.**

See this image and copyright information in PMC

References

1. Watts DJ, Strogatz SH. Collective Dynamics of Small-World Networks. Nature. 1998;393:440–442. doi: 10.1038/30918 - DOI - PubMed
1. Albert R, Barabasi AL. Statistical Mechanics of Complex Networks. Reviews of Modern Physics. 2002;74:74–47. doi: 10.1103/RevModPhys.74.47 - DOI
1. Newman MEJ. The Structure and Function of Complex Networks. SIAM Review. 2003;45:167–256. doi: 10.1137/S003614450342480 - DOI
1. Aittokallio T, Schwikowski B. Graph-based methods for analysing networks in cell biology. Briefings in bioinformatics. 2006;7(3):243–255. doi: 10.1093/bib/bbl022 - DOI - PubMed
1. Bonacich P, Lu P. Introduction to mathematical sociology. Princeton University Press; 2012.

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Starling: Introducing a mesoscopic scale with Confluence for Graph Clustering

Affiliation

Starling: Introducing a mesoscopic scale with Confluence for Graph Clustering

Author

Affiliation

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources