Robust statistical boosting with quantile-based adaptive loss functions
- PMID: 35950232
- DOI: 10.1515/ijb-2021-0127
Robust statistical boosting with quantile-based adaptive loss functions
Abstract
We combine robust loss functions with statistical boosting algorithms in an adaptive way to perform variable selection and predictive modelling for potentially high-dimensional biomedical data. To achieve robustness against outliers in the outcome variable (vertical outliers), we consider different composite robust loss functions together with base-learners for linear regression. For composite loss functions, such as the Huber loss and the Bisquare loss, a threshold parameter has to be specified that controls the robustness. In the context of boosting algorithms, we propose an approach that adapts the threshold parameter of composite robust losses in each iteration to the current sizes of residuals, based on a fixed quantile level. We compared the performance of our approach to classical M-regression, boosting with standard loss functions or the lasso regarding prediction accuracy and variable selection in different simulated settings: the adaptive Huber and Bisquare losses led to a better performance when the outcome contained outliers or was affected by specific types of corruption. For non-corrupted data, our approach yielded a similar performance to boosting with the efficient L 2 loss or the lasso. Also in the analysis of skewed KRT19 protein expression data based on gene expression measurements from human cancer cell lines (NCI-60 cell line panel), boosting with the new adaptive loss functions performed favourably compared to standard loss functions or competing robust approaches regarding prediction accuracy and resulted in very sparse models.
Keywords: Bisquare loss; Huber loss; gradient boosting; robust regression.
© 2022 Walter de Gruyter GmbH, Berlin/Boston.
Similar articles
-
Randomized boosting with multivariable base-learners for high-dimensional variable selection and prediction.BMC Bioinformatics. 2021 Sep 16;22(1):441. doi: 10.1186/s12859-021-04340-z. BMC Bioinformatics. 2021. PMID: 34530737 Free PMC article.
-
Robust loss functions for boosting.Neural Comput. 2007 Aug;19(8):2183-244. doi: 10.1162/neco.2007.19.8.2183. Neural Comput. 2007. PMID: 17571942
-
The importance of knowing when to stop. A sequential stopping rule for component-wise gradient boosting.Methods Inf Med. 2012;51(2):178-86. doi: 10.3414/ME11-02-0030. Epub 2012 Feb 20. Methods Inf Med. 2012. PMID: 22344292
-
An Update on Statistical Boosting in Biomedicine.Comput Math Methods Med. 2017;2017:6083072. doi: 10.1155/2017/6083072. Epub 2017 Aug 2. Comput Math Methods Med. 2017. PMID: 28831290 Free PMC article. Review.
-
Extending statistical boosting. An overview of recent methodological developments.Methods Inf Med. 2014;53(6):428-35. doi: 10.3414/ME13-01-0123. Epub 2014 Aug 12. Methods Inf Med. 2014. PMID: 25112429 Review.
References
-
- Barrios, EB. Robustness, data analysis, and statistical modeling: the first 50 years and beyond. Commun Stat Appl Methods 2015;22:543–56. https://doi.org/10.5351/csam.2015.22.6.543 . - DOI
-
- Huber, PJ. Robust statistics . New York: John Wiley & Sons; 1981.
-
- Maronna, RA, Martin, RD, Yohai, VJ, Salibián-Barrera, M. Robust statistics: Theory and methods (with R) , 2nd ed. New York: John Wiley & Sons; 2019.
-
- Susanti, Y, Pratiwi, H, Sulistijowati, S, Liana, T. M estimation, S estimation, and MM estimation in robust regression. Int J Pure Appl Math 2014;91:349–60. https://doi.org/10.12732/ijpam.v91i3.7 . - DOI
-
- Fan, J, Lv, J. A selective overview of variable selection in high dimensional feature space. Stat Sin 2010;20:101–48.
MeSH terms
LinkOut - more resources
Full Text Sources
Medical
Miscellaneous