Prediction-based variable selection for component-wise gradient boosting

doi:10.1515/ijb-2023-0052

. 2023 Nov 27;20(1):293-314.

doi: 10.1515/ijb-2023-0052. eCollection 2024 May 1.

Prediction-based variable selection for component-wise gradient boosting

Sophie Potts¹, Elisabeth Bergherr¹, Constantin Reinke², Colin Griesbach¹

Affiliations

¹ Chair of Spatial Data Science and Statistical Learning, University of Goettingen, Goettingen, Germany.
² Chair of Empirical Methods in Social Science and Demography, University of Rostock, Rostock, Germany.

PMID: 38000054
DOI: 10.1515/ijb-2023-0052

Prediction-based variable selection for component-wise gradient boosting

Sophie Potts et al. Int J Biostat. 2023.

. 2023 Nov 27;20(1):293-314.

doi: 10.1515/ijb-2023-0052. eCollection 2024 May 1.

Authors

Sophie Potts¹, Elisabeth Bergherr¹, Constantin Reinke², Colin Griesbach¹

Affiliations

¹ Chair of Spatial Data Science and Statistical Learning, University of Goettingen, Goettingen, Germany.
² Chair of Empirical Methods in Social Science and Demography, University of Rostock, Rostock, Germany.

PMID: 38000054
DOI: 10.1515/ijb-2023-0052

Abstract

Model-based component-wise gradient boosting is a popular tool for data-driven variable selection. In order to improve its prediction and selection qualities even further, several modifications of the original algorithm have been developed, that mainly focus on different stopping criteria, leaving the actual variable selection mechanism untouched. We investigate different prediction-based mechanisms for the variable selection step in model-based component-wise gradient boosting. These approaches include Akaikes Information Criterion (AIC) as well as a selection rule relying on the component-wise test error computed via cross-validation. We implemented the AIC and cross-validation routines for Generalized Linear Models and evaluated them regarding their variable selection properties and predictive performance. An extensive simulation study revealed improved selection properties whereas the prediction error could be lowered in a real world application with age-standardized COVID-19 incidence rates.

Keywords: gradient boosting; high-dimensional data; prediction analysis; sparse models; variable selection.

PubMed Disclaimer

References

1. Bühlmann, P, Hothorn, T. Boosting Algorithms: Regularization, Prediction and Model Fitting. Stat Sci 2007;22:477–505. https://doi.org/10.1214/07-sts242 . - DOI
1. Freund, Y, Schapire, R. Experiments with a new boosting algorithm. In: Proceedings of the thirteenth international conference on machine learning theory ; 1996:148–56 pp.
1. Breiman, L. Arcing the edge . Berkeley: Statistics Department, University of California at Berkeley; 1997:1–14 pp.
1. Friedman, J, Hastie, T, Tibshirani, R. Additive logistic regression: A statistical view of boosting (with discussion and a rejoinder by the authors). Ann Stat 2000;28:337–407. https://doi.org/10.1214/aos/1016218223 . - DOI
1. Friedman, J. Greedy function approximation: A gradient boosting machine. Ann Stat 2001;29:1189–232. https://doi.org/10.1214/aos/1013203451 . - DOI

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
- De Gruyter
Medical
- MedlinePlus Health Information

[1] Bühlmann, P, Hothorn, T. Boosting Algorithms: Regularization, Prediction and Model Fitting. Stat Sci 2007;22:477–505. https://doi.org/10.1214/07-sts242 . - DOI

[2] Bühlmann, P, Hothorn, T. Boosting Algorithms: Regularization, Prediction and Model Fitting. Stat Sci 2007;22:477–505. https://doi.org/10.1214/07-sts242 . - DOI

[3] Freund, Y, Schapire, R. Experiments with a new boosting algorithm. In: Proceedings of the thirteenth international conference on machine learning theory ; 1996:148–56 pp.

[4] Freund, Y, Schapire, R. Experiments with a new boosting algorithm. In: Proceedings of the thirteenth international conference on machine learning theory ; 1996:148–56 pp.

[5] Breiman, L. Arcing the edge . Berkeley: Statistics Department, University of California at Berkeley; 1997:1–14 pp.

[6] Breiman, L. Arcing the edge . Berkeley: Statistics Department, University of California at Berkeley; 1997:1–14 pp.

[7] Friedman, J, Hastie, T, Tibshirani, R. Additive logistic regression: A statistical view of boosting (with discussion and a rejoinder by the authors). Ann Stat 2000;28:337–407. https://doi.org/10.1214/aos/1016218223 . - DOI

[8] Friedman, J, Hastie, T, Tibshirani, R. Additive logistic regression: A statistical view of boosting (with discussion and a rejoinder by the authors). Ann Stat 2000;28:337–407. https://doi.org/10.1214/aos/1016218223 . - DOI

[9] Friedman, J. Greedy function approximation: A gradient boosting machine. Ann Stat 2001;29:1189–232. https://doi.org/10.1214/aos/1013203451 . - DOI

[10] Friedman, J. Greedy function approximation: A gradient boosting machine. Ann Stat 2001;29:1189–232. https://doi.org/10.1214/aos/1013203451 . - DOI

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Prediction-based variable selection for component-wise gradient boosting

Affiliations

Prediction-based variable selection for component-wise gradient boosting

Authors

Affiliations

Abstract

Similar articles

References

MeSH terms

LinkOut - more resources

Full Text Sources

Medical

Abstract

Similar articles

References

MeSH terms

Related information

LinkOut - more resources

Full Text Sources

Medical