Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Jul 7;7(28):eabh1303.
doi: 10.1126/sciadv.abh1303. Print 2021 Jul.

Generative hypergraph clustering: From blockmodels to modularity

Affiliations

Generative hypergraph clustering: From blockmodels to modularity

Philip S Chodrow et al. Sci Adv. .

Abstract

Hypergraphs are a natural modeling paradigm for networked systems with multiway interactions. A standard task in network analysis is the identification of closely related or densely interconnected nodes. We propose a probabilistic generative model of clustered hypergraphs with heterogeneous node degrees and edge sizes. Approximate maximum likelihood inference in this model leads to a clustering objective that generalizes the popular modularity objective for graphs. From this, we derive an inference algorithm that generalizes the Louvain graph community detection method, and a faster, specialized variant in which edges are expected to lie fully within clusters. Using synthetic and empirical data, we demonstrate that the specialized method is highly scalable and can detect clusters where graph-based methods fail. We also use our model to find interpretable higher-order structure in school contact networks, U.S. congressional bill cosponsorship and committees, product categories in copurchasing behavior, and hotel locations from web browsing sessions.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1. Runtime, ARI, and number of clusters returned by GMLL and HMLL in a synthetic testbed with optimal affinity parameters.
The within-cluster edge placement probabilities are p2 = 3/5, p3 = 1/n3, and p4 = 1/n4. We also show in light gray the results obtained by using GMLL as a preprocessing step, whose output partition is then refined by HMLL (light gray).
Fig. 2
Fig. 2. Detectability experiments in synthetic hypergraphs.
For i = 2,3, pi is the proportion of within-cluster edges of size i. Each pixel gives the mean ARI over 20 independently generated DCHSBMs of size n = 500 where each node is incident to, on average, five 2-edges and five 3-edges. (Left) The recovered partition z^ is obtained from GMLL. (Right) The recovered partition is obtained from AON HMLL (algorithm S1). The dashed and dotted lines give various detectability thresholds as described in the main text. In each panel, the returned partition z^ is the highest-likelihood partition from 20 alternations between updating z^ and inference of the affinity parameters. In this experiment only, we incorporate a regularization term nlog¯ in the modularity objective to promote label vectors z with fewer clusters.
Fig. 3
Fig. 3. Comparison of clustering algorithms in contact-primary-school and contact-high-school.
For each dataset, we show a partition obtained from the classical graph Louvain modularity maximization heuristic, a partition obtained from GMLL, and partition obtained by AON HMLL. The partition shown is the one that attains the corresponding objective function after 20 rounds of iterative likelihood maximization. Each box records the number of agents with the specified combination of inferred cluster and ground truth label. The bottom row visualizes the number mk of edges of size k, the inferred size weights βk, and inferred resolution parameters γk as defined in (15). On the far right, γk0 = mk/vol(H)k.
Fig. 4
Fig. 4. Comparison of hypergraph AON MLL (algorithm S1) against dyadic likelihood Louvain in data with known clusters.
Points give the ARI of the highest-likelihood partition obtained after 20 alternations between partitioning and parameter estimation. The maximum edge size k¯ varies along the horizontal axis. In the panel titles, n is the number of nodes and m the number of edges when k¯=25. Note that the vertical axis limits vary between panels.

References

    1. M. O. Jackson, Social and Economic Networks (Princeton Univ. Press, 2008).
    1. D. Easley, J. Kleinberg, Networks, Crowds, and Markets: Reasoning about a Highly Connected World (Cambridge Univ. Press, 2010).
    1. M. E. J. Newman, Networks: An Introduction (Oxford Univ. Press, 2010).
    1. Milo R., Shen-Orr S., Itzkovitz S., Kashtan N., Chklovskii D., Alon U., Network motifs: Simple building blocks of complex networks. Science 298, 824–827 (2002). - PubMed
    1. Benson A. R., Gleich D. F., Leskovec J., Higher-order organization of complex networks. Science 353, 163–166 (2016). - PMC - PubMed

LinkOut - more resources