. 2021 Jul 28;144(6):1738-1750.

doi: 10.1093/brain/awab108.

Towards realizing the vision of precision medicine: AI based prediction of clinical drug response

Johann de Jong¹, Ioana Cutcutache², Matthew Page², Sami Elmoufti³, Cynthia Dilley⁴, Holger Fröhlich^{1

5

6}, Martin Armstrong⁷

Affiliations

¹ Data and Translational Sciences, UCB Biosciences GmbH, 40789 Monheim am Rhein, Germany.
² Data and Translational Sciences, UCB Pharma, Slough SL1 3WE, UK.
³ Late Development Statistics, UCB Biosciences Inc., Raleigh, NC 27617, USA.
⁴ Head of Asset Strategy, UCB Inc., Smyrna, GA 30080, USA.
⁵ Fraunhofer Institute for Scientific Computing and Algorithms (SCAI), Business Area Bioinformatics, 53757 Sankt Augustin, Germany.
⁶ Bonn-Aachen International Center for IT, University of Bonn, 53115 Bonn, Germany.
⁷ Data and Translational Sciences, UCB Pharma, 1420 Braine l'Alleud, Belgium.

PMID: 33734308
PMCID: PMC8320273
DOI: 10.1093/brain/awab108

Towards realizing the vision of precision medicine: AI based prediction of clinical drug response

Johann de Jong et al. Brain. 2021.

. 2021 Jul 28;144(6):1738-1750.

doi: 10.1093/brain/awab108.

Authors

Johann de Jong¹, Ioana Cutcutache², Matthew Page², Sami Elmoufti³, Cynthia Dilley⁴, Holger Fröhlich^{1

5

6}, Martin Armstrong⁷

Affiliations

¹ Data and Translational Sciences, UCB Biosciences GmbH, 40789 Monheim am Rhein, Germany.
² Data and Translational Sciences, UCB Pharma, Slough SL1 3WE, UK.
³ Late Development Statistics, UCB Biosciences Inc., Raleigh, NC 27617, USA.
⁴ Head of Asset Strategy, UCB Inc., Smyrna, GA 30080, USA.
⁵ Fraunhofer Institute for Scientific Computing and Algorithms (SCAI), Business Area Bioinformatics, 53757 Sankt Augustin, Germany.
⁶ Bonn-Aachen International Center for IT, University of Bonn, 53115 Bonn, Germany.
⁷ Data and Translational Sciences, UCB Pharma, 1420 Braine l'Alleud, Belgium.

PMID: 33734308
PMCID: PMC8320273
DOI: 10.1093/brain/awab108

Abstract

Accurate and individualized prediction of response to therapies is central to precision medicine. However, because of the generally complex and multifaceted nature of clinical drug response, realizing this vision is highly challenging, requiring integrating different data types from the same individual into one prediction model. We used the anti-epileptic drug brivaracetam as a case study and combine a hybrid data/knowledge-driven feature extraction with machine learning to systematically integrate clinical and genetic data from a clinical discovery dataset (n = 235 patients). We constructed a model that successfully predicts clinical drug response [area under the curve (AUC) = 0.76] and show that even with limited sample size, integrating high-dimensional genetics data with clinical data can inform drug response prediction. After further validation on data collected from an independently conducted clinical study (AUC = 0.75), we extensively explore our model to gain insights into the determinants of drug response, and identify various clinical and genetic characteristics predisposing to poor response. Finally, we assess the potential impact of our model on clinical trial design and demonstrate that, by enriching for probable responders, significant reductions in clinical study sizes may be achieved. To our knowledge, our model represents the first retrospectively validated machine learning model linking drug mechanism of action and the genetic, clinical and demographic background in epilepsy patients to clinical drug response. Hence, it provides a blueprint for how machine learning-based multimodal data integration can act as a driver in achieving the goals of precision medicine in fields such as neurology.

Keywords: artificial intelligence; clinical study design; epilepsy; pharmacogenetics; precision medicine.

PubMed Disclaimer

Figures

**Figure 1**
**Analysis pipeline and gene set definition.** (A) Combining hybrid data/knowledge-driven feature extraction with advanced ML to systematically integrate clinical and genetic data for predicting brivaracetam response. (B) Defining literature-derived gene sets that relate to (1) epilepsy disease aetiology and (2) brivaracetam’s mechanism of action.

**Figure 2**
**Model performance and external validation.** (A) Performance of several ML approaches to systematically integrate the genetic and clinical data modalities for predicting response to brivaracetam, as estimated using repeated cross-validation with 20 repeats (each dot represents a repeat). The performance is shown in terms of area under ROC (AUC). An AUC of 0.5 corresponds to chance level, and an AUC of 1 is the best achievable performance. Statistical significance was assessed by a paired Wilcoxon-test between the best and the second-best model. (B) ROC of our best model, a gradient-boosted decision trees classifier trained jointly on all data modalities. In light grey, the individual ROCs for the 20 repeats, in red the average ROC across the 20 repeats and the associated empirical 95% CI based on the 20 repeats. The ROC depicts the trade-off between specificity and sensitivity of a classifier, while varying the diagnostic cut-off for the probability of response: Choosing to treat only patients with high predicted probability of response results in a high specificity (most treated patients will benefit from the treatment), but a low sensitivity (many patients that could potentially have benefited from treatment are not treated). Conversely, treating even patients with low predicted response probability results in a high sensitivity (most patients who could potentially benefit are treated) but low specificity (many treated patients do not benefit from the treatment). The AUC is a measure of overall prediction performance. An AUC of 0.5 corresponds to chance level, and an AUC of 1 is the best achievable performance. (C) ROC of validating of our best model on the independent validation dataset.

**Figure 3**
**Determinants of drug response probability.** (A) Performance of gradient-boosted trees classifiers on the individual data modalities, compared with the integrated model. Statistical significance was assessed by a paired Wilcoxon-test between the clinical-only model and the integrated model. (B) All patient features with non-zero average absolute SHAP values. Error bars represent the empirical 95% CIs based on 20 cross-validation repeats. (C) SHAP dependence plots for four selected variables. *Top* to *bottom*, *left* to *right*: Prior use of levetiracetam, extra-temporal focus localization, mutational load in gene set GO:0051011 and structural variants overlapping *SV2A*. Each circle represents a single patient (note that circles are often superimposed). (D) Univariate associations of selected variables with brivaracetam response. *Top* to *bottom*, *left* to *right*: Prior use of levetiracetam, extra-temporal focus localization, mutational load in gene set GO:0051011 and structural variants overlapping *SV2A*. Statistical significance was assessed by a Wilcoxon test (for GO:0051011) or a Fisher’s exact test (for the other three features).

**Figure 4**
**Application to clinical trial design.** (A) The trade-off between positive rate (number of patients included in a trial) and positive predictive value (fraction of responders in a trial), as a function of the classifier threshold. 95% CIs were determined using the 20 cross-validation repeats. (B) Using the classifier to enrich for responders in confirmatory studies: Minimum required sample size in a confirmatory trial (at 90% statistical power and 0.05 significance level) as a function of classifier threshold. CIs were determined by computing sample sizes for the CIs in A. (C) Training the model on increasing sample sizes in the inner cross-validation loop, while recording the performance on a set of left-out samples of fixed size (n = 24) in the outer cross-validation loop. Each dot represents the performance of a model trained on a different randomly subsampled number of patients (n). To determine the 20 cross-validation repeats, 95% CIs were used. (D) Extrapolation of performance for larger n, using robust linear regression in blue, with a 95% prediction interval.

See this image and copyright information in PMC

Comment in

One step closer towards personalized epilepsy management.
Chen Z, Anderson A, Ge Z, Kwan P. Chen Z, et al. Brain. 2021 Jul 28;144(6):1624-1626. doi: 10.1093/brain/awab199. Brain. 2021. PMID: 34061164 No abstract available.

References

1. Vogenberg FR, Isaacson Barash C, Pursel M.. Personalized medicine: Part 1: Evolution and development into theranostics. P & T. 2010;35(10):560–576. - PMC - PubMed
1. Fröhlich H, Balling R, Beerenwinkel N, et al.From hype to reality: Data science enabling personalized medicine. BMC Med. 2018;16(1):150. - PMC - PubMed
1. FDA. Table of pharmacogenomic biomarkers in drug labeling. 2020. Accessed 1 October 2020. https://www.fda.gov/drugs/science-and-research-drugs/table-pharmacogenom...
1. Armstrong M. The Genetics of Adverse Drug Reactions. In: Cohen N, ed. Pharmacogenomics and Personalized Medicine. Methods in Pharmacology and Toxicology. Humana Press; 2008:121–147.
1. Peck RW. Precision medicine is not just genomics: The right dose for every patient. Ann Rev Pharmacol Toxicol. 2018;58(1):105–122. - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Towards realizing the vision of precision medicine: AI based prediction of clinical drug response

Affiliations

Towards realizing the vision of precision medicine: AI based prediction of clinical drug response

Authors

Affiliations

Abstract

Figures

Comment in

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Other Literature Sources