Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Mar;5(3):261-263.
doi: 10.1016/j.bpsc.2019.09.003. Epub 2019 Sep 16.

Double Dipping in Machine Learning: Problems and Solutions

Affiliations

Double Dipping in Machine Learning: Problems and Solutions

Tali M Ball et al. Biol Psychiatry Cogn Neurosci Neuroimaging. 2020 Mar.
No abstract available

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Strategies for detecting double dipping. (A) Results of random data test generated using a dataset of entirely random numbers representing a varying number of “predictor variables” (first column), and a random binary “outcome,” evenly distributed in 136 “subjects.” Because the data are random noise, model performance should be #50% and should not improve dramatically with an increasing number of random predictors, as in the fair model with all variables (second column). However, with a 2-step random forest procedure that includes double dipping to select a subset of variables (third column), the model based on fully random data shows high accuracy, especially with a large number of predictors (final column). (B) Results of a permutation test on a random forest analysis procedure that included double dipping. The red line indicates expected average accuracy of permuted outcome data if no double dipping were present (outcome base rate). The blue line indicates average accuracy of permuted data using double-dipped analysis procedure. The green line indicates observed accuracy in double-dipped analysis with real data. The black line indicates range of accuracy with 2-tailed p < .05.

References

    1. Kriegeskorte N, Simmons WK, Bellgowan PS, Baker CI (2009): Circular analysis in systems neuroscience: The dangers of double dipping. Nat Neurosci 12:535–540. - PMC - PubMed
    1. Fortmann-Roe S (2012): Understanding the bias-variance tradeoff. Available at: http://scott.fortmann-roe.com/docs/BiasVariance.html. Accessed September 16, 2019.
    1. Breiman L (2001): Random forests. Machine Learning 45:5–32.
    1. Squeglia LM, Ball TM, Jacobus J, Brumback T, McKenna BS, Nguyen-Louie TT, et al. (2017): Neural predictors of alcohol use initiation during adolescence. Am J Psychiatry 174:172–185. - PMC - PubMed

Publication types

MeSH terms