Practical guide to SHAP analysis: Explaining supervised machine learning model predictions in drug development

Ana Victoria Ponce-Bobadilla¹, Vanessa Schmitt¹, Corinna S Maier¹, Sven Mensing¹, Sven Stodtmann¹

Affiliations

PMID: 39463176
PMCID: PMC11513550
DOI: 10.1111/cts.70056

Review

Practical guide to SHAP analysis: Explaining supervised machine learning model predictions in drug development

Ana Victoria Ponce-Bobadilla et al. Clin Transl Sci. 2024 Nov.

. 2024 Nov;17(11):e70056.

doi: 10.1111/cts.70056.

Authors

Ana Victoria Ponce-Bobadilla¹, Vanessa Schmitt¹, Corinna S Maier¹, Sven Mensing¹, Sven Stodtmann¹

Affiliation

¹ AbbVie Deutschland GmbH & Co. KG, Ludwigshafen, Germany.

PMID: 39463176
PMCID: PMC11513550
DOI: 10.1111/cts.70056

Abstract

Despite increasing interest in using Artificial Intelligence (AI) and Machine Learning (ML) models for drug development, effectively interpreting their predictions remains a challenge, which limits their impact on clinical decisions. We address this issue by providing a practical guide to SHapley Additive exPlanations (SHAP), a popular feature-based interpretability method, which can be seamlessly integrated into supervised ML models to gain a deeper understanding of their predictions, thereby enhancing their transparency and trustworthiness. This tutorial focuses on the application of SHAP analysis to standard ML black-box models for regression and classification problems. We provide an overview of various visualization plots and their interpretation, available software for implementing SHAP, and highlight best practices, as well as special considerations, when dealing with binary endpoints and time-series models. To enhance the reader's understanding for the method, we also apply it to inherently explainable regression models. Finally, we discuss the limitations and ongoing advancements aimed at tackling the current drawbacks of the method.

PubMed Disclaimer

Conflict of interest statement

All authors are employees of AbbVie and may hold AbbVie stock.

Figures

**FIGURE 1**
Standard supervised ML workflow.

**FIGURE 2**
Different visualization plots of SHAP values from an XGBoost model when predicting blood pressure: (a) Bar plot; (b) Beeswarm plot; (c) A scatter plot for the feature age colored by each subject's BMI; (d) Waterfall plot for an example subject.

**FIGURE 3**
Scatter plot of SHAP values for feature age faceted by train‐test status when considering cross‐validation. The trends across the different folds are depicted in different lines corresponding to the fold.

**FIGURE 4**
Visualization plots of SHAP values derived from an XGBoost model for a classification problem, explaining the predicted probabilities (a,c,e) and the predicted log‐odds (b,d,f).

**FIGURE 5**
(a) Example time‐course of the PK model considered to model drug concentration. Different visualization plots for the SHAP values explaining the predictions of individual clearances are depicted; (b) Bar plot; (c) Beeswarm plot; (d–f) Scatter plots of SHAP values corresponding to concentration at different times.

**FIGURE 6**
Visualizations plots for SHAP values of different ML regression models; (a,c,e) Bar plots; (b,d,f) Beeswarm plots.

See this image and copyright information in PMC

References

1. Liu Q, Zhu H, Liu C, et al. Application of machine learning in drug development and regulation: current status and future potential. Clin Pharmacol Ther. 2020;107:726‐729. - PubMed
1. Terranova N, Renard D, Shahin MH, et al. Artificial intelligence for quantitative modeling in drug discovery and development: an innovation and quality consortium perspective on use cases and best practices. Clin Pharmacol Ther. 2024;115:658‐672. - PubMed
1. Marques L, Costa B, Pereira M, et al. Advancing precision medicine: a review of innovative in silico approaches for drug development, clinical pharmacology and personalized healthcare. Pharmaceutics. 2024;16:332. 10.3390/pharmaceutics16030332 - DOI - PMC - PubMed
1. Bhhatarai B, Walters WP, Hop C, Lanza G, Ekins S. Opportunities and challenges using artificial intelligence in ADME/Tox. Nat Mater. 2019;18:418‐422. - PMC - PubMed
1. Zhang W, Roy Burman SS, Chen J, et al. Machine learning modeling of protein‐intrinsic features predicts tractability of targeted protein degradation. Genomics Proteomics Bioinformatics. 2022;20:882‐898. - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
- PubMed Central
- Wiley

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Practical guide to SHAP analysis: Explaining supervised machine learning model predictions in drug development

Affiliation

Practical guide to SHAP analysis: Explaining supervised machine learning model predictions in drug development

Authors

Affiliation

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources