Tree-Values: Selective Inference for Regression Trees
- PMID: 38481523
- PMCID: PMC10933572
Tree-Values: Selective Inference for Regression Trees
Abstract
We consider conducting inference on the output of the Classification and Regression Tree (CART) (Breiman et al., 1984) algorithm. A naive approach to inference that does not account for the fact that the tree was estimated from the data will not achieve standard guarantees, such as Type 1 error rate control and nominal coverage. Thus, we propose a selective inference framework for conducting inference on a fitted CART tree. In a nutshell, we condition on the fact that the tree was estimated from the data. We propose a test for the difference in the mean response between a pair of terminal nodes that controls the selective Type 1 error rate, and a confidence interval for the mean response within a single terminal node that attains the nominal selective coverage. Efficient algorithms for computing the necessary conditioning sets are provided. We apply these methods in simulation and to a dataset involving the association between portion control interventions and caloric intake.
Keywords: CART; Regression trees; hypothesis testing; post-selection inference; selective inference.
Figures













References
-
- Bhattacharya PK. Some aspects of change-point analysis. Lecture Notes-Monograph Series, pages 28–56, 1994.
-
- Bourgon Richard. Overview of the intervals package, 2009. R Vignette, URL https://cran.r-project.org/web/packages/intervals/vignettes/intervals_ov....
-
- Breiman Leo, Friedman Jerome, Stone Charles J, and Olshen Richard A. Classification and regression trees. CRC Press, 1984.
-
- Chen Shuxiao and Bien Jacob. Valid inference corrected for outlier removal. Journal of Computational and Graphical Statistics, 29(2):323–334, 2020.
Grants and funding
LinkOut - more resources
Full Text Sources