Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Aug 24;18(8):e0290090.
doi: 10.1371/journal.pone.0290090. eCollection 2023.

Starling: Introducing a mesoscopic scale with Confluence for Graph Clustering

Affiliations

Starling: Introducing a mesoscopic scale with Confluence for Graph Clustering

Bruno Gaume. PLoS One. .

Abstract

Given a Graph G = (V, E) and two vertices i, j ∈ V, we introduce Confluence(G, i, j), a vertex mesoscopic closeness measure based on short Random walks, which brings together vertices from a same overconnected region of the Graph G, and separates vertices coming from two distinct overconnected regions. Confluence becomes a useful tool for defining a new Clustering quality function QConf(G, Γ) for a given Clustering Γ and for defining a new heuristic Starling to find a partitional Clustering of a Graph G intended to optimize the Clustering quality function QConf. We compare the accuracies of Starling, to the accuracies of three state of the art Graphs Clustering methods: Spectral-Clustering, Louvain, and Infomap. These comparisons are done, on the one hand with artificial Graphs (a) Random Graphs and (b) a classical Graphs Clustering Benchmark, and on the other hand with (c) Terrain-Graphs gathered from real data. We show that with (a), (b) and (c), Starling is always able to obtain equivalent or better accuracies than the three others methods. We show also that with the Benchmark (b), Starling is able to obtain equivalent accuracies and even sometimes better than an Oracle that would only know the expected overconnected regions from the Benchmark, ignoring the concretely constructed edges.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Optimal Clusterings for QPedge and QConf on Gtoy1.
If two vertices have same color, then they are in a same Module, with 〈P, R, F〉 where P=Precision(Pairs(Γ),E), R=Recall(Pairs(Γ),E), F=Fscore(Pairs(Γ),E).
Fig 2
Fig 2. Homogeneity and Completeness: {x, y} ∈ E iff x and y have same color.
Fig 3
Fig 3. Binary classifiers of nodes pairs by nodes blocks.
Fig 4
Fig 4. Performance with BenchmarkER.
Each point (x, y) is the average over 100 Graphs with p = x.
Fig 5
Fig 5. Performance with BenchmarkLFR (k = 15).
Each point (x, y) is the average over 100 Graphs with μ = x. Fig 5(a)–5(c) are zooms on the Fscores when the overconnected regions are less clear (i.e. when we can no longer trust OracleLFR(GLFR)=ΓGLFR).
Fig 6
Fig 6. Performance with BenchmarkLFR (k = 25).
Each point (x, y) is the average over 100 Graphs with μ = x. Fig 6(a)–6(c) are zooms on the Fscores when the overconnected regions are less clear (i.e. when we can no longer trust OracleLFR(GLFR)=ΓGLFR).
Fig 7
Fig 7. Performance of SGC(GEmail=(VGEmail,EGEmail), κ), κ varying.
According to the intrinsic truth EGEmail in Fig 7(a), and in Fig 7(b) according to the extrinsic truth Pairs(ΓDep)=γΓDepP2γ.
Fig 8
Fig 8. Performance of SGC(G = (V,E), κ) according to the intrinsic truth E, κ varying.
Fig 9
Fig 9. Performance with BenchmarkER.
Each point (x, y) is the average over 100 Graphs with p = x.
Fig 10
Fig 10. Performance with BenchmarkLFR (k = 15).
Each point (x, y) is the average over 100 Graphs with μ = x. Fig 10(a)–10(c) are zooms on the Fscores when the overconnected regions are less clear (i.e. when we can no longer trust OracleLFR(GLFR)=ΓGLFR).
Fig 11
Fig 11. Performance with BenchmarkLFR (k = 25).
Each point (x, y) is the average over 100 Graphs with μ = x. Fig 11(a)–11(c) are zooms on the Fscores when the overconnected regions are less clear (i.e. when we can no longer trust OracleLFR(GLFR)=ΓGLFR).
Fig 12
Fig 12. Optimal Clusterings for QConf0.0 with t = 3 and with t = 6: Shapes describe an optimal Clustering for QConf0.0 with t = 3, colors describe an optimal Clustering for QConf0.0 with t = 6.

References

    1. Watts DJ, Strogatz SH. Collective Dynamics of Small-World Networks. Nature. 1998;393:440–442. doi: 10.1038/30918 - DOI - PubMed
    1. Albert R, Barabasi AL. Statistical Mechanics of Complex Networks. Reviews of Modern Physics. 2002;74:74–47. doi: 10.1103/RevModPhys.74.47 - DOI
    1. Newman MEJ. The Structure and Function of Complex Networks. SIAM Review. 2003;45:167–256. doi: 10.1137/S003614450342480 - DOI
    1. Aittokallio T, Schwikowski B. Graph-based methods for analysing networks in cell biology. Briefings in bioinformatics. 2006;7(3):243–255. doi: 10.1093/bib/bbl022 - DOI - PubMed
    1. Bonacich P, Lu P. Introduction to mathematical sociology. Princeton University Press; 2012.

Publication types