The importance of knowing when to stop. A sequential stopping rule for component-wise gradient boosting
- PMID: 22344292
- DOI: 10.3414/ME11-02-0030
The importance of knowing when to stop. A sequential stopping rule for component-wise gradient boosting
Abstract
Objectives: Component-wise boosting algorithms have evolved into a popular estimation scheme in biomedical regression settings. The iteration number of these algorithms is the most important tuning parameter to optimize their performance. To date, no fully automated strategy for determining the optimal stopping iteration of boosting algorithms has been proposed.
Methods: We propose a fully data-driven sequential stopping rule for boosting algorithms. It combines resampling methods with a modified version of an earlier stopping approach that depends on AIC-based information criteria. The new "subsampling after AIC" stopping rule is applied to component-wise gradient boosting algorithms.
Results: The newly developed sequential stopping rule outperformed earlier approaches if applied to both simulated and real data. Specifically, it improved purely AIC-based methods when used for the microarray-based prediction of the recurrence of metastases for stage II colon cancer patients.
Conclusions: The proposed sequential stopping rule for boosting algorithms can help to identify the optimal stopping iteration already during the fitting process of the algorithm, at least for the most common loss functions.
Similar articles
-
Prediction-based variable selection for component-wise gradient boosting.Int J Biostat. 2023 Nov 27;20(1):293-314. doi: 10.1515/ijb-2023-0052. eCollection 2024 May 1. Int J Biostat. 2023. PMID: 38000054
-
Randomized boosting with multivariable base-learners for high-dimensional variable selection and prediction.BMC Bioinformatics. 2021 Sep 16;22(1):441. doi: 10.1186/s12859-021-04340-z. BMC Bioinformatics. 2021. PMID: 34530737 Free PMC article.
-
Extending statistical boosting. An overview of recent methodological developments.Methods Inf Med. 2014;53(6):428-35. doi: 10.3414/ME13-01-0123. Epub 2014 Aug 12. Methods Inf Med. 2014. PMID: 25112429 Review.
-
Robust statistical boosting with quantile-based adaptive loss functions.Int J Biostat. 2022 Aug 10;19(1):111-129. doi: 10.1515/ijb-2021-0127. eCollection 2023 May 1. Int J Biostat. 2022. PMID: 35950232
-
Applying various algorithms for species distribution modelling.Integr Zool. 2013 Jun;8(2):124-35. doi: 10.1111/1749-4877.12000. Integr Zool. 2013. PMID: 23731809 Review.
Cited by
-
Boosted Multivariate Trees for Longitudinal Data.Mach Learn. 2017 Feb;106(2):277-305. doi: 10.1007/s10994-016-5597-1. Epub 2016 Nov 4. Mach Learn. 2017. PMID: 29249866 Free PMC article.
-
Controlling false discoveries in high-dimensional situations: boosting with stability selection.BMC Bioinformatics. 2015 May 6;16:144. doi: 10.1186/s12859-015-0575-3. BMC Bioinformatics. 2015. PMID: 25943565 Free PMC article.
-
Using phenotypic distribution models to predict livestock performance.Sci Rep. 2019 Oct 25;9(1):15371. doi: 10.1038/s41598-019-51910-6. Sci Rep. 2019. PMID: 31653937 Free PMC article.
-
Estimating patients' risk for postoperative delirium from preoperative routine data - Trial design of the PRe-Operative prediction of postoperative DElirium by appropriate SCreening (PROPDESC) study - A monocentre prospective observational trial.Contemp Clin Trials Commun. 2019 Dec 4;17:100501. doi: 10.1016/j.conctc.2019.100501. eCollection 2020 Mar. Contemp Clin Trials Commun. 2019. PMID: 31890984 Free PMC article.
-
Boosting the discriminatory power of sparse survival models via optimization of the concordance index and stability selection.BMC Bioinformatics. 2016 Jul 22;17:288. doi: 10.1186/s12859-016-1149-8. BMC Bioinformatics. 2016. PMID: 27444890 Free PMC article.
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources