Using controls to limit false discovery in the era of big data
- PMID: 30217148
- PMCID: PMC6137876
- DOI: 10.1186/s12859-018-2356-2
Using controls to limit false discovery in the era of big data
Abstract
Background: Procedures for controlling the false discovery rate (FDR) are widely applied as a solution to the multiple comparisons problem of high-dimensional statistics. Current FDR-controlling procedures require accurately calculated p-values and rely on extrapolation into the unknown and unobserved tails of the null distribution. Both of these intermediate steps are challenging and can compromise the reliability of the results.
Results: We present a general method for controlling the FDR that capitalizes on the large amount of control data often found in big data studies to avoid these frequently problematic intermediate steps. The method utilizes control data to empirically construct the distribution of the test statistic under the null hypothesis and directly compares this distribution to the empirical distribution of the test data. By not relying on p-values, our control data-based empirical FDR procedure more closely follows the foundational principles of the scientific method: that inference is drawn by comparing test data to control data. The method is demonstrated through application to a problem in structural genomics.
Conclusions: The method described here provides a general statistical framework for controlling the FDR that is specifically tailored for the big data setting. By relying on empirically constructed distributions and control data, it forgoes potentially problematic modeling steps and extrapolation into the unknown tails of the null distribution. This procedure is broadly applicable insofar as controlled experiments or internal negative controls are available, as is increasingly common in the big data setting.
Keywords: Big data; False discovery rate (FDR); High dimensional inference; Hypothesis testing.
Conflict of interest statement
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Figures


Similar articles
-
A new estimation of protein-level false discovery rate.BMC Genomics. 2018 Aug 13;19(Suppl 6):567. doi: 10.1186/s12864-018-4923-3. BMC Genomics. 2018. PMID: 30367581 Free PMC article.
-
Resampling-based empirical Bayes multiple testing procedures for controlling generalized tail probability and expected value error rates: focus on the false discovery rate and simulation study.Biom J. 2008 Oct;50(5):716-44. doi: 10.1002/bimj.200710473. Biom J. 2008. PMID: 18932138 Free PMC article.
-
Statistical detection of EEG synchrony using empirical bayesian inference.PLoS One. 2015 Mar 30;10(3):e0121795. doi: 10.1371/journal.pone.0121795. eCollection 2015. PLoS One. 2015. PMID: 25822617 Free PMC article.
-
Analysis of multilocus models of association.Genet Epidemiol. 2003 Jul;25(1):36-47. doi: 10.1002/gepi.10237. Genet Epidemiol. 2003. PMID: 12813725 Review.
-
Multiple comparisons: philosophies and illustrations.Am J Physiol Regul Integr Comp Physiol. 2000 Jul;279(1):R1-8. doi: 10.1152/ajpregu.2000.279.1.R1. Am J Physiol Regul Integr Comp Physiol. 2000. PMID: 10896857 Review.
Cited by
-
Systematic review and meta-analysis of the association between ABCA7 common variants and Alzheimer's disease in non-Hispanic White and Asian cohorts.Front Aging Neurosci. 2024 Oct 17;16:1406573. doi: 10.3389/fnagi.2024.1406573. eCollection 2024. Front Aging Neurosci. 2024. PMID: 39484364 Free PMC article.
-
F. prausnitzii potentially modulates the association between citrus intake and depression.Microbiome. 2024 Nov 14;12(1):237. doi: 10.1186/s40168-024-01961-3. Microbiome. 2024. PMID: 39543781 Free PMC article.
References
-
- Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B Methodol. 1995;57:289–300.
-
- Benjamini Y, Heller R. False discovery rates for spatial signals. J Am Stat Assoc. 2007;102:1272–1281. doi: 10.1198/016214507000000941. - DOI
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
Molecular Biology Databases