STATISTICAL TESTS FOR LARGE TREE-STRUCTURED DATA
- PMID: 37013199
- PMCID: PMC10066867
- DOI: 10.1080/01621459.2016.1240081
STATISTICAL TESTS FOR LARGE TREE-STRUCTURED DATA
Abstract
We develop a general statistical framework for the analysis and inference of large tree-structured data, with a focus on developing asymptotic goodness-of-fit tests. We first propose a consistent statistical model for binary trees, from which we develop a class of invariant tests. Using the model for binary trees, we then construct tests for general trees by using the distributional properties of the Continuum Random Tree, which arises as the invariant limit for a broad class of models for tree-structured data based on conditioned Galton-Watson processes. The test statistics for the goodness-of-fit tests are simple to compute and are asymptotically distributed as χ 2 and F random variables. We illustrate our methods on an important application of detecting tumour heterogeneity in brain cancer. We use a novel approach with tree-based representations of magnetic resonance images and employ the developed tests to ascertain tumor heterogeneity between two groups of patients.
Keywords: Conditioned Galton-Watson trees; Consistent statistical models; Dyck path; Goodness-of-fit tests.
Figures






References
-
- Adams R, Ghahramani Z, Jordan MI. Tree-structured stick breaking for hierarchical data. NIPS. 2010
-
- Aho A, Sagiv Y, Szymanski TG, Ullman JD. Inferring a Tree from Lowest Common Ancestors with an Application to the Optimization of Relational Expressions. SIAM Journal of Computation. 1981;10:405–421.
-
- Aldous D. The Continuum Random Tree I. Annals of Probability. 1991a;19:1–28.
-
- Aldous D. Asymptotic Fringe Distributions for General Families of Random Trees. Annals of Applied Probability. 1991b;1:228–266.
-
- Aldous D. The Continuum Random Tree III. Annals of Probability. 1993;21:248–289.
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources