Generative Hypergraph Clustering: Scalable Heuristics and Sparse Thresholds
Speaker: Phil Chodrow (Middlebury College)
Date: 1/24/23
Abstract: Many data sets concern entities that interact or gather in groups of two or more. Such data sets might describe people congregating for events, tags categorizing a forum post, and raw food ingredients combining in a recipe. Hypergraphs—generalizations of graphs in which edges can join any number of nodes—are a natural modeling framework for such data sets. The hypergraph clustering task asks us to group nodes into a small number of meaningful clusters using the observed edges. In this talk, we’ll discuss a generative approach to hypergraph clustering. This generative approach uses a parametric statistical model called a stochastic blockmodel (SBM). After introducing an SBM for hypergraphs, we’ll consider two main questions. First, we’ll ask: how can the generative approach inform scalable clustering heuristics? We’ll derive a fast greedy heuristic for clustering under certain common SBM assumptions. This heuristic generalizes Louvain modularity clustering to the hypergraph setting. We’ll then demonstrate the performance of this heuristic on empirical and synthetic hypergraphs up to 1 million nodes. Second, we’ll ask: what are the fundamental limits of generative clustering in sparse hypergraphs? We’ll briefly describe the belief-propagation algorithm for clustering in the sparse SBM. We’ll then analyze belief-propagation to construct a closed-form conjecture for the threshold at which generative clustering fails to detect planted clusters. Along the way, we’ll derive an eigenvector-based hypergraph clustering algorithm via a generalization of the graph nonbacktracking matrix. This talk includes joint work with Nate Veldt (Texas A&M), Austin Benson (Cornell, D. E. Shaw), Nicole Eikmeier (Grinnell), and Jamie Haddock (Harvey Mudd). The two papers discussed may be found at the following links: - https://www.science.org/doi/full/10.1126/sciadv.abh1303 - https://arxiv.org/abs/2204.13586.