Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Aug 10;19(1):111-129.
doi: 10.1515/ijb-2021-0127. eCollection 2023 May 1.

Robust statistical boosting with quantile-based adaptive loss functions

Affiliations
Free article

Robust statistical boosting with quantile-based adaptive loss functions

Jan Speller et al. Int J Biostat. .
Free article

Abstract

We combine robust loss functions with statistical boosting algorithms in an adaptive way to perform variable selection and predictive modelling for potentially high-dimensional biomedical data. To achieve robustness against outliers in the outcome variable (vertical outliers), we consider different composite robust loss functions together with base-learners for linear regression. For composite loss functions, such as the Huber loss and the Bisquare loss, a threshold parameter has to be specified that controls the robustness. In the context of boosting algorithms, we propose an approach that adapts the threshold parameter of composite robust losses in each iteration to the current sizes of residuals, based on a fixed quantile level. We compared the performance of our approach to classical M-regression, boosting with standard loss functions or the lasso regarding prediction accuracy and variable selection in different simulated settings: the adaptive Huber and Bisquare losses led to a better performance when the outcome contained outliers or was affected by specific types of corruption. For non-corrupted data, our approach yielded a similar performance to boosting with the efficient L 2 loss or the lasso. Also in the analysis of skewed KRT19 protein expression data based on gene expression measurements from human cancer cell lines (NCI-60 cell line panel), boosting with the new adaptive loss functions performed favourably compared to standard loss functions or competing robust approaches regarding prediction accuracy and resulted in very sparse models.

Keywords: Bisquare loss; Huber loss; gradient boosting; robust regression.

PubMed Disclaimer

Similar articles

References

    1. Barrios, EB. Robustness, data analysis, and statistical modeling: the first 50 years and beyond. Commun Stat Appl Methods 2015;22:543–56. https://doi.org/10.5351/csam.2015.22.6.543 . - DOI
    1. Huber, PJ. Robust statistics . New York: John Wiley & Sons; 1981.
    1. Maronna, RA, Martin, RD, Yohai, VJ, Salibián-Barrera, M. Robust statistics: Theory and methods (with R) , 2nd ed. New York: John Wiley & Sons; 2019.
    1. Susanti, Y, Pratiwi, H, Sulistijowati, S, Liana, T. M estimation, S estimation, and MM estimation in robust regression. Int J Pure Appl Math 2014;91:349–60. https://doi.org/10.12732/ijpam.v91i3.7 . - DOI
    1. Fan, J, Lv, J. A selective overview of variable selection in high dimensional feature space. Stat Sin 2010;20:101–48.

LinkOut - more resources