Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017;112(520):1733-1743.
doi: 10.1080/01621459.2016.1240081. Epub 2017 Aug 7.

STATISTICAL TESTS FOR LARGE TREE-STRUCTURED DATA

Affiliations

STATISTICAL TESTS FOR LARGE TREE-STRUCTURED DATA

Karthik Bharath et al. J Am Stat Assoc. 2017.

Abstract

We develop a general statistical framework for the analysis and inference of large tree-structured data, with a focus on developing asymptotic goodness-of-fit tests. We first propose a consistent statistical model for binary trees, from which we develop a class of invariant tests. Using the model for binary trees, we then construct tests for general trees by using the distributional properties of the Continuum Random Tree, which arises as the invariant limit for a broad class of models for tree-structured data based on conditioned Galton-Watson processes. The test statistics for the goodness-of-fit tests are simple to compute and are asymptotically distributed as χ 2 and F random variables. We illustrate our methods on an important application of detecting tumour heterogeneity in brain cancer. We use a novel approach with tree-based representations of magnetic resonance images and employ the developed tests to ascertain tumor heterogeneity between two groups of patients.

Keywords: Conditioned Galton-Watson trees; Consistent statistical models; Dyck path; Goodness-of-fit tests.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Left: Binary tree with n = 10 vertices, 9 edges, 5 leaves (v1, v2, v3, v4, v5) with a candidate 5-dimensional distribution f5. Middle: Identical binary tree with labels of vertices permuted; it is required that the distribution is still f5. Right: Resulting binary tree when leaves v4 and v5 are removed from the tree on the left; it is required that the distribution becomes the 3-dimensional f3, and is invariant to labeling.
Figure 2
Figure 2
A tree on the left and its L-tree on the right corresponding to vertices {v1, v2, v3, v4}. The L-tree contains the root, the branch points b1, b2 and the set of vertices {v1, v2, v3, v4}. All the vertices chosen are leaves.
Figure 3
Figure 3
(i), (ii), (iii): Construction of binary tree τ(3) with 3 leaves and 2×3-1= 5 edges, 6 vertices (including root) from nonhomogeneous Poisson process model with arrival times t1, t2 and t3. (iv): A binary tree isomorphic to the tree in (iii) where a+b+e=t1, c=t3t2 and d=t2t1.
Figure 4
Figure 4
A tree with root at the bottom and its corresponding Dyck path. The x axis ranges from 0 to twice the sum of lengths of the edges; the Dyck path is constructed by traversing the tree in a depth-first manner at unit speed.
Figure 5
Figure 5
Preprocessing steps starting from a MR image to the computation of total path length of an L-tree constructed from the resulting binary tree.
Figure 6
Figure 6
T2-weighted Brain MR images of segmented tumors (outlined in black) and dendrograms for a patient who experiences (left) a long survival time of 57.8 months and (right) a short survival time of 0.723 months. Dendrograms were constructed from pixels obtained from only the segmented regions.

References

    1. Adams R, Ghahramani Z, Jordan MI. Tree-structured stick breaking for hierarchical data. NIPS. 2010
    1. Aho A, Sagiv Y, Szymanski TG, Ullman JD. Inferring a Tree from Lowest Common Ancestors with an Application to the Optimization of Relational Expressions. SIAM Journal of Computation. 1981;10:405–421.
    1. Aldous D. The Continuum Random Tree I. Annals of Probability. 1991a;19:1–28.
    1. Aldous D. Asymptotic Fringe Distributions for General Families of Random Trees. Annals of Applied Probability. 1991b;1:228–266.
    1. Aldous D. The Continuum Random Tree III. Annals of Probability. 1993;21:248–289.

LinkOut - more resources