Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024:25:323.

Graphical Dirichlet Process for Clustering Non-Exchangeable Grouped Data

Affiliations

Graphical Dirichlet Process for Clustering Non-Exchangeable Grouped Data

Arhit Chakrabarti et al. J Mach Learn Res. 2024.

Abstract

We consider the problem of clustering grouped data with possibly non-exchangeable groups whose dependencies can be characterized by a known directed acyclic graph. To allow the sharing of clusters among the non-exchangeable groups, we propose a Bayesian nonparametric approach, termed graphical Dirichlet process, that jointly models the dependent group-specific random measures by assuming each random measure to be distributed as a Dirichlet process whose concentration parameter and base probability measure depend on those of its parent groups. The resulting joint stochastic process respects the Markov property of the directed acyclic graph that links the groups. We characterize the graphical Dirichlet process using a novel hypergraph representation as well as the stick-breaking representation, the restaurant-type representation, and the representation as a limit of a finite mixture model. We develop an efficient posterior inference algorithm and illustrate our model with simulations and a real grouped single-cell data set.

Keywords: Bayesian nonparametrics; clustering; directed acyclic graph; family-owned restaurant process; non-exchangeable groups.

PubMed Disclaimer

Figures

Figure 1:
Figure 1:
DAG for AR2.
Figure 2:
Figure 2:
Family tree with three generations. Males are depicted with the color blue and females with red.
Figure 3:
Figure 3:
Sharing of features by GDP.
Figure 4:
Figure 4:
Schematic illustration of HDP and GDP. HDP is a special case of GDP when the DAG is a fork.
Figure 5:
Figure 5:
DAG augmented with a hidden root G1(0), indicated by the dashed arrows. The original root nodes are G1(1),,Gl1(1).
Figure 6:
Figure 6:
Illustration of hypernodes (represented by dashed ovals) of the DAG for our motivational problem. (a) Hypernode H2 consists of the generation-1 ancestors (i.e., G2 and G3) of node G6. (b) Hypernode H3 consists of the generation-1 ancestors (i.e., G3 and G4) of node G7. (c) Hypernode H4 consists of the generation-1 ancestors (i.e., G5,G6, and G7) of node G8. Hypernode H* consists of the generation-2 ancestors (i.e., G2,G3, and G4) of node G8.
Figure 7:
Figure 7:
The DAG of experimental groups.
Figure 8:
Figure 8:
Clustering performance of GDP for different sample sizes. The colors indicate the estimated clusters by GDP. Adjusted Rand index is reported at the top of each panel.
Figure 9:
Figure 9:
The boxplots of the adjusted Rand indices for GDP, HDP, and k-means for all sample sizes.
Figure 10:
Figure 10:
Clustering of the group-specific single-cell data whose dimensions are reduced to 2 by UMAP by (a) GDP and (b) HDP.

Similar articles

References

    1. Alam Md. Hijbul, Peltonen Jaakko, Nummenmaa Jyrki, and Järvelin Kalervo. Tree-structured hierarchical dirichlet process. In Rodríguez Sara, Prieto Javier, Faria Pedro, Kłos Sławomir, Fernández Alberto, Mazuelas Santiago, Jiménez-López M. Dolores, Moreno María N., and Navarro Elena M., editors, Distributed Computing and Artificial Intelligence, Special Sessions, 15th International Conference, pages 291–299, Cham, 2019. Springer International Publishing. ISBN 978-3-319-99608-0.
    1. Antoniak Charles E.. Mixtures of dirichlet processes with applications to bayesian nonparametric problems. The Annals of Statistics, 2(6):1152–1174, 1974. ISSN 00905364. URL http://www.jstor.org/stable/2958336.
    1. Barrios Ernesto, Lijoi Antonio, Nieto-Barajas Luis E, and Prünster Igor. Modeling with normalized random measure mixture models. Statistical Science, 28(3):313–334, 2013.
    1. Basu D. On statistics independent of a complete sufficient statistic. Sankhyā: The Indian Journal of Statistics (1933–1960), 15(4):377–380, 1955. ISSN 00364452. URL http://www.jstor.org/stable/25048259.
    1. Beraha Mario, Guglielmi Alessandra, and Quintana Fernando A. The semi-hierarchical dirichlet process and its application to clustering homogeneous distributions. Bayesian Analysis, 16(4): 1187–1219, 2021.

LinkOut - more resources