Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018;107(12):1895-1922.
doi: 10.1007/s10994-018-5714-4. Epub 2018 May 9.

Bootstrapping the out-of-sample predictions for efficient and accurate cross-validation

Affiliations

Bootstrapping the out-of-sample predictions for efficient and accurate cross-validation

Ioannis Tsamardinos et al. Mach Learn. 2018.

Abstract

Cross-Validation (CV), and out-of-sample performance-estimation protocols in general, are often employed both for (a) selecting the optimal combination of algorithms and values of hyper-parameters (called a configuration) for producing the final predictive model, and (b) estimating the predictive performance of the final model. However, the cross-validated performance of the best configuration is optimistically biased. We present an efficient bootstrap method that corrects for the bias, called Bootstrap Bias Corrected CV (BBC-CV). BBC-CV's main idea is to bootstrap the whole process of selecting the best-performing configuration on the out-of-sample predictions of each configuration, without additional training of models. In comparison to the alternatives, namely the nested cross-validation (Varma and Simon in BMC Bioinform 7(1):91, 2006) and a method by Tibshirani and Tibshirani (Ann Appl Stat 822-829, 2009), BBC-CV is computationally more efficient, has smaller variance and bias, and is applicable to any metric of performance (accuracy, AUC, concordance index, mean squared error). Subsequently, we employ again the idea of bootstrapping the out-of-sample predictions to speed up the CV process. Specifically, using a bootstrap-based statistical criterion we stop training of models on new folds of inferior (with high probability) configurations. We name the method Bootstrap Bias Corrected with Dropping CV (BBCD-CV) that is both efficient and provides accurate performance estimates.

Keywords: Bias correction; Cross-validation; Hyper-parameter optimization; Performance estimation.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Average (over 500 repeats) estimated bias of accuracy estimation of the CVT, TT, NCV, BBC-CV and BBCD-CV. The average true classification accuracy of all configurations is 60%. CVT over-estimates performance in all settings. TT’s behaviour varies for sample size N<500 and is conservative for N500. NCV provides almost unbiased estimates of performance, while BBC-CV is more conservative with a difference in the bias of 0.013 points of accuracy on average. BBCD-CV is on par with NCV
Fig. 2
Fig. 2
Average estimated bias (over 20 sub-datasets for each original dataset) of the CVT, TT, NCV, BBC-CV and BBCD-CV estimates of performance. CVT is optimistically biased for sample size N100. TT’s bias varies with sample size and dataset, and it is mainly over-conservative for N80. NCV and BBC-CV, both have low bias though results vary with dataset. BBCD-CV has, on average, greater bias than BBC-CV for N100 and identical for N=500
Fig. 3
Fig. 3
Relative average true performance of the models returned by the BBCD-CV and CVT. For N100 the loss in performance varies greatly with dataset, however, for N=500 there is negligible to no loss in performance. If N if fairly large, BBCD-CV will accelerate the CVT procedure without sacrificing the quality of the resulting model or the accuracy of its performance estimate
Fig. 4
Fig. 4
The speed-up of BBCD-CV over CVT is shown for sample size N=500. It is computed as the ratio of models trained by CVT over BBCD-CV. Typically, BBCD-CV achieves a speed-up of 2-5, up to 10 for the gisette dataset. Overall, using BBCD-CV results in a significant speed boost, without sacrificing model quality or performance estimation
Fig. 5
Fig. 5
Relative average true performance of BBC-CV10 to BBC-CV (left), and of BBC-CV10 to NCV (right). Multiple repeats increase the performance of the returned models, maintaining the accuracy of the performance estimation. If computational time is not a limitation, it is preferable to use BBC-CV10 over NCV
Fig. 6
Fig. 6
Coverage of the {50%,55%,,95%,99%} CIs returned by BBC-CV, BBCD-CV, and BBC-CV10, defined as the ratio of the estimated CIs that contain the corresponding true performances of the produced models. The CIs are mainly conservative and become more accurate with increasing sample size and multiple repeats
Fig. 7
Fig. 7
Average width (over all 20 sub-datasets) of CIs with increasing number of repeats (BBC-CVX, X=1...10), for each dataset. CIs shrink with increasing sample size and number of repeats

References

    1. Adamou, M., Antoniou, G., Greasidou, E., Lagani, V., Charonyktakis, P., Tsamardinos, I., & Doyle, M. Towards automatic risk assessment to support suicide prevention. Crisis (to appear) - PubMed
    1. Adamou, M., Antoniou, G., Greasidou, E., Lagani, V., Charonyktakis, P., Tsamardinos, I., & Doyle, M. (2018). Mining free-text medical notes for suicide risk assessment. In: Proceedings of the 10th hellenic conference on artificial intelligence, SETN 2018, Patras, Greece, July 9-15, 2018. ACM.
    1. Akaike H. A new look at the statistical model identification. IEEE Transactions on Automatic Control. 1974;19(6):716–723. doi: 10.1109/TAC.1974.1100705. - DOI
    1. Bernau C, Augustin T, Boulesteix AL. Correcting the optimal resampling-based error rate by estimating the error rate of wrapper algorithms. Biometrics. 2013;69(3):693–702. doi: 10.1111/biom.12041. - DOI - PubMed
    1. Borboudakis G, Stergiannakos T, Frysali M, Klontzas E, Tsamardinos I, Froudakis GE. Chemically intuited, large-scale screening of MOFs by machine learning techniques. npj Computational Materials. 2017;3(1):40. doi: 10.1038/s41524-017-0045-8. - DOI

LinkOut - more resources