Hypotheses on a tree: new error rates and testing strategies

Marina Bogomolov¹, Christine B Peterson², Yoav Benjamini³, Chiara Sabatti⁴

Affiliations

¹ The William Davidson Faculty of Industrial Engineering and Management, Technion-Israel Institute of Technology, Technion City, Haifa 3200003, Israel.
² Department of Biostatistics, Division of Basic Science Research, The University of Texas, MD Anderson Cancer Center, Houston, Texas 77030, U.S.A.
³ Department of Statistics and Operations Research, Tel-Aviv University, P.O. Box 39040, Tel-Aviv 6997801, Israel.
⁴ Department of Statistics, Stanford University, 50 Governor's Lane, Stanford, California 94305, U.S.A.

PMID: 36825068
PMCID: PMC9945647
DOI: 10.1093/biomet/asaa086

Hypotheses on a tree: new error rates and testing strategies

Marina Bogomolov et al. Biometrika. 2021 Sep.

. 2021 Sep;108(3):575-590.

doi: 10.1093/biomet/asaa086. Epub 2020 Oct 14.

Authors

Marina Bogomolov¹, Christine B Peterson², Yoav Benjamini³, Chiara Sabatti⁴

Affiliations

¹ The William Davidson Faculty of Industrial Engineering and Management, Technion-Israel Institute of Technology, Technion City, Haifa 3200003, Israel.
² Department of Biostatistics, Division of Basic Science Research, The University of Texas, MD Anderson Cancer Center, Houston, Texas 77030, U.S.A.
³ Department of Statistics and Operations Research, Tel-Aviv University, P.O. Box 39040, Tel-Aviv 6997801, Israel.
⁴ Department of Statistics, Stanford University, 50 Governor's Lane, Stanford, California 94305, U.S.A.

PMID: 36825068
PMCID: PMC9945647
DOI: 10.1093/biomet/asaa086

Abstract

We introduce a multiple testing procedure that controls global error rates at multiple levels of resolution. Conceptually, we frame this problem as the selection of hypotheses that are organized hierarchically in a tree structure. We describe a fast algorithm and prove that it controls relevant error rates given certain assumptions on the dependence between the p-values. Through simulations, we demonstrate that the proposed procedure provides the desired guarantees under a range of dependency structures and that it has the potential to gain power over alternative methods. Finally, we apply the method to studies on the genetic regulation of gene expression across multiple tissues and on the relation between the gut microbiome and colorectal cancer.

Keywords: False discovery rate; Hierarchical testing; Multiple testing; Selective inference.

PubMed Disclaimer

Figures

**Fig. 1.**
Hierarchical structure of hypotheses in a four-level tree. Circles represent true null hypotheses, while squares denote false nulls. Children of the same parent constitute a family of hypotheses. To give an example of the sequential order of testing, nodes corresponding to tested hypotheses are unfilled, while grey nodes indicate hypotheses that are not tested. A red border distinguishes rejected hypotheses. Tested families are enclosed within dashed borders, with some labelled as $ℱ_{i}^{j}$ to illustrate the notation.

**Fig. 2.**
Illustration of the bottom-up calculation of the proposed error rate for level 4, sfdr⁴, using the same configuration of hypotheses as in Fig. 1. The error measure *𝒠_j*(4) is defined for rejected hypotheses and indicated by the red number in the node corresponding to each rejection. The hypotheses not distinguished by red borders are not rejected and so do not receive any error measure. If the rejections are nodes at the level of interest, which is level 4 in this illustration, the error measure is 1 for an incorrect rejection and 0 otherwise. For a node at a higher level, the error measure is the average of the error measures assigned to its children if it has one or more rejected child hypotheses and is 0 otherwise.

**Fig. 3.**
Results for the example. Each point corresponds to the average of 1000 realizations. Dashed horizontal lines indicate the target values for the error rates. The methods under comparison are the Benjamini–Hochberg procedure (orange diamonds), the Benjamini–Bogomolov method (red squares), the nonhierarchical version of the p-filter (pink circles), the hierarchical version of the p-filter (purple circles) and TreeBH (blue triangles).

**Fig. 4.**
Taxonomic tree of selections obtained using the TreeBH procedure. Additional discoveries of TreeBH that were not found with the Benjamini–Hochberg procedure are marked in red.

See this image and copyright information in PMC

References

1. Benjamini Y & Bogomolov M (2014). Selective inference on multiple families of hypotheses. J. R. Statist. Soc. B 76, 297–318.
1. Benjamini Y & Heller R (2007). False discovery rates for spatial signals. J. Am. Statist. Assoc 102, 1272–81.
1. Benjamini Y & Hochberg Y (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. R. Statist. Soc. B 57, 289–300.
1. Benjamini Y & Yekutieli D (2001). The control of the false discovery rate in multiple testing under dependency. Ann. Statist 29, 1165–88.
1. Brzyski D, Peterson CB, Sobczyk P, Candes EJ, Bogdan M & Sabatti C (2017). Controlling the rate of GWAS false discoveries. Genetics 205, 61–75. - PMC - PubMed

Grants and funding

R01 MH101782/MH/NIMH NIH HHS/United States

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Hypotheses on a tree: new error rates and testing strategies

Affiliations

Hypotheses on a tree: new error rates and testing strategies

Authors

Affiliations

Abstract

Figures

References

Grants and funding

LinkOut - more resources

Full Text Sources