Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 May 13;24(5):687.
doi: 10.3390/e24050687.

Feature Importance in Gradient Boosting Trees with Cross-Validation Feature Selection

Affiliations

Feature Importance in Gradient Boosting Trees with Cross-Validation Feature Selection

Afek Ilay Adler et al. Entropy (Basel). .

Abstract

Gradient Boosting Machines (GBM) are among the go-to algorithms on tabular data, which produce state-of-the-art results in many prediction tasks. Despite its popularity, the GBM framework suffers from a fundamental flaw in its base learners. Specifically, most implementations utilize decision trees that are typically biased towards categorical variables with large cardinalities. The effect of this bias was extensively studied over the years, mostly in terms of predictive performance. In this work, we extend the scope and study the effect of biased base learners on GBM feature importance (FI) measures. We demonstrate that although these implementation demonstrate highly competitive predictive performance, they still, surprisingly, suffer from bias in FI. By utilizing cross-validated (CV) unbiased base learners, we fix this flaw at a relatively low computational cost. We demonstrate the suggested framework in a variety of synthetic and real-world setups, showing a significant improvement in all GBM FI measures while maintaining relatively the same level of prediction accuracy.

Keywords: classification and regression trees; feature importance; gradient boosting; tree-based methods.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Figures

Figure 3
Figure 3
Scaled FI on the Amazon data-set, across 30 folds. Variables are ordered by their cardinality, from left to right, in descending order.
Figure 5
Figure 5
Scaled FI on the Criteo CTR data-set, across 30 folds. Variables are ordered by their cardinality, from left to right, in descending order. For visualization, FI results for features with low cardinality that have small FI values are grouped.
Figure A1
Figure A1
SHAP FI for the null case.
Figure A2
Figure A2
SHAP FI for the power case.
Figure A3
Figure A3
FI on the Amazon and Criteo CTR data-sets, including SHAP FI.
Figure 1
Figure 1
Scaled Gain FI (left) and PFI (right) for the null case experiment where all displayed features are uninformative.
Figure 2
Figure 2
Scaled Gain FI (left) and PFI (right) for the power case experiment where only X1 is informative. Since X1 is informative, it is on a different scale and it is therefore omitted to improve visualization. Its mean values are 0.186, 0.181, 0.166 and 0.145 for Vanilla GBM, LGBM, CatBoost and CVB, respectively.
Figure 4
Figure 4
Error (log-loss) on the Amazon data-set, across 30 folds, with and without the Resource variable.
Figure 6
Figure 6
Error (log-loss) on the Criteo data-set, across 30 folds, with and without the following variables: Device_ip, Device_id, Device_model.

References

    1. Lundberg S.M., Nair B., Vavilala M.S., Horibe M., Eisses M.J., Adams T., Liston D.E., Low D.K.W., Newman S.F., Kim J., et al. Explainable machine-learning predictions for the prevention of hypoxaemia during surgery. Nat. Biomed. Eng. 2018;2:749–760. doi: 10.1038/s41551-018-0304-0. - DOI - PMC - PubMed
    1. Friedman J.H. Greedy function approximation: A gradient boosting machine. Ann. Stat. 2001;29:1189–1232. doi: 10.1214/aos/1013203451. - DOI
    1. Breiman L., Friedman J., Stone C.J., Olshen R.A. Classification and Regression Trees. CRC Press; Boca Raton, FL, USA: 1984.
    1. Quinlan J.R. Induction of decision trees. Mach. Learn. 1986;1:81–106. doi: 10.1007/BF00116251. - DOI
    1. Richardson M., Dominowska E., Ragno R. Predicting clicks: Estimating the click-through rate for new ads; Proceedings of the 16th International Conference on World Wide Web; Banff, AB, Canada. 8–12 May 2007; pp. 521–530.

LinkOut - more resources