Generative hypergraph clustering: From blockmodels to modularity

Philip S Chodrow¹, Nate Veldt², Austin R Benson³

Affiliations

¹ Department of Mathematics, University of California, Los Angeles, 520 Portola Plaza, Los Angeles, CA 90095, USA. phil@math.ucla.edu.
² Center for Applied Mathematics, Cornell University, 657 Frank H.T. Rhodes Hall, Ithaca, NY 14853, USA.
³ Department of Computer Science, Cornell University, 413B Gates Hall, Ithaca, NY 14853, USA.

PMID: 34233880
PMCID: PMC11559555
DOI: 10.1126/sciadv.abh1303

Generative hypergraph clustering: From blockmodels to modularity

Philip S Chodrow et al. Sci Adv. 2021.

. 2021 Jul 7;7(28):eabh1303.

doi: 10.1126/sciadv.abh1303. Print 2021 Jul.

Authors

Philip S Chodrow¹, Nate Veldt², Austin R Benson³

Affiliations

¹ Department of Mathematics, University of California, Los Angeles, 520 Portola Plaza, Los Angeles, CA 90095, USA. phil@math.ucla.edu.
² Center for Applied Mathematics, Cornell University, 657 Frank H.T. Rhodes Hall, Ithaca, NY 14853, USA.
³ Department of Computer Science, Cornell University, 413B Gates Hall, Ithaca, NY 14853, USA.

PMID: 34233880
PMCID: PMC11559555
DOI: 10.1126/sciadv.abh1303

Abstract

Hypergraphs are a natural modeling paradigm for networked systems with multiway interactions. A standard task in network analysis is the identification of closely related or densely interconnected nodes. We propose a probabilistic generative model of clustered hypergraphs with heterogeneous node degrees and edge sizes. Approximate maximum likelihood inference in this model leads to a clustering objective that generalizes the popular modularity objective for graphs. From this, we derive an inference algorithm that generalizes the Louvain graph community detection method, and a faster, specialized variant in which edges are expected to lie fully within clusters. Using synthetic and empirical data, we demonstrate that the specialized method is highly scalable and can detect clusters where graph-based methods fail. We also use our model to find interpretable higher-order structure in school contact networks, U.S. congressional bill cosponsorship and committees, product categories in copurchasing behavior, and hotel locations from web browsing sessions.

Copyright © 2021 The Authors, some rights reserved; exclusive licensee American Association for the Advancement of Science. No claim to original U.S. Government Works. Distributed under a Creative Commons Attribution License 4.0 (CC BY).

PubMed Disclaimer

Figures

**Fig. 1. Runtime, ARI, and number of clusters returned by GMLL and HMLL in a synthetic testbed with optimal affinity parameters.**
The within-cluster edge placement probabilities are p₂ = 3/5, p₃ = 1/n³, and p₄ = 1/n⁴. We also show in light gray the results obtained by using GMLL as a preprocessing step, whose output partition is then refined by HMLL (light gray).

**Fig. 2. Detectability experiments in synthetic hypergraphs.**
For i = 2,3, *p_i* is the proportion of within-cluster edges of size i. Each pixel gives the mean ARI over 20 independently generated DCHSBMs of size n = 500 where each node is incident to, on average, five 2-edges and five 3-edges. (**Left**) The recovered partition $\hat{z}$ is obtained from GMLL. (**Right**) The recovered partition is obtained from AON HMLL (algorithm S1). The dashed and dotted lines give various detectability thresholds as described in the main text. In each panel, the returned partition $\hat{z}$ is the highest-likelihood partition from 20 alternations between updating $\hat{z}$ and inference of the affinity parameters. In this experiment only, we incorporate a regularization term $- n log \bar{ℓ}$ in the modularity objective to promote label vectors z with fewer clusters.

**Fig. 3. Comparison of clustering algorithms in contact-primary-school and contact-high-school.**
For each dataset, we show a partition obtained from the classical graph Louvain modularity maximization heuristic, a partition obtained from GMLL, and partition obtained by AON HMLL. The partition shown is the one that attains the corresponding objective function after 20 rounds of iterative likelihood maximization. Each box records the number of agents with the specified combination of inferred cluster and ground truth label. The bottom row visualizes the number *m_k* of edges of size k, the inferred size weights β_k, and inferred resolution parameters γ_k as defined in (15). On the far right, γ_k0 = *m_k*/**vol**(H)^k.

**Fig. 4. Comparison of hypergraph AON MLL (algorithm S1) against dyadic likelihood Louvain in data with known clusters.**
Points give the ARI of the highest-likelihood partition obtained after 20 alternations between partitioning and parameter estimation. The maximum edge size $\bar{k}$ varies along the horizontal axis. In the panel titles, n is the number of nodes and m the number of edges when $\bar{k} = 25$ . Note that the vertical axis limits vary between panels.

See this image and copyright information in PMC

References

1. M. O. Jackson, Social and Economic Networks (Princeton Univ. Press, 2008).
1. D. Easley, J. Kleinberg, Networks, Crowds, and Markets: Reasoning about a Highly Connected World (Cambridge Univ. Press, 2010).
1. M. E. J. Newman, Networks: An Introduction (Oxford Univ. Press, 2010).
1. Milo R., Shen-Orr S., Itzkovitz S., Kashtan N., Chklovskii D., Alon U., Network motifs: Simple building blocks of complex networks. Science 298, 824–827 (2002). - PubMed
1. Benson A. R., Gleich D. F., Leskovec J., Higher-order organization of complex networks. Science 353, 163–166 (2016). - PMC - PubMed

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Generative hypergraph clustering: From blockmodels to modularity

Affiliations

Generative hypergraph clustering: From blockmodels to modularity

Authors

Affiliations

Abstract

Figures

References

LinkOut - more resources

Full Text Sources