. 2022 Sep 27:13:975855.

doi: 10.3389/fphar.2022.975855. eCollection 2022.

An interpretable stacking ensemble learning framework based on multi-dimensional data for real-time prediction of drug concentration: The example of olanzapine

Xiuqing Zhu^{1

2}, Jinqing Hu^{1

2}, Tao Xiao^{1

3}, Shanqing Huang^{1

2}, Yuguan Wen^{1

2}, Dewei Shang^{1

2}

Affiliations

¹ Department of Pharmacy, The Affiliated Brain Hospital of Guangzhou Medical University, Guangzhou, China.
² Guangdong Engineering Technology Research Center for Translational Medicine of Mental Disorders, Guangzhou, China.
³ Department of Clinical Research, Guangdong Second Provincial General Hospital, Guangzhou, China.

PMID: 36238557
PMCID: PMC9552071
DOI: 10.3389/fphar.2022.975855

An interpretable stacking ensemble learning framework based on multi-dimensional data for real-time prediction of drug concentration: The example of olanzapine

Xiuqing Zhu et al. Front Pharmacol. 2022.

. 2022 Sep 27:13:975855.

doi: 10.3389/fphar.2022.975855. eCollection 2022.

Authors

Xiuqing Zhu^{1

2}, Jinqing Hu^{1

2}, Tao Xiao^{1

3}, Shanqing Huang^{1

2}, Yuguan Wen^{1

2}, Dewei Shang^{1

2}

Affiliations

¹ Department of Pharmacy, The Affiliated Brain Hospital of Guangzhou Medical University, Guangzhou, China.
² Guangdong Engineering Technology Research Center for Translational Medicine of Mental Disorders, Guangzhou, China.
³ Department of Clinical Research, Guangdong Second Provincial General Hospital, Guangzhou, China.

PMID: 36238557
PMCID: PMC9552071
DOI: 10.3389/fphar.2022.975855

Abstract

Background and Aim: Therapeutic drug monitoring (TDM) has evolved over the years as an important tool for personalized medicine. Nevertheless, some limitations are associated with traditional TDM. Emerging data-driven model forecasting [e.g., through machine learning (ML)-based approaches] has been used for individualized therapy. This study proposes an interpretable stacking-based ML framework to predict concentrations in real time after olanzapine (OLZ) treatment. Methods: The TDM-OLZ dataset, consisting of 2,142 OLZ measurements and 472 features, was formed by collecting electronic health records during the TDM of 927 patients who had received OLZ treatment. We compared the performance of ML algorithms by using 10-fold cross-validation and the mean absolute error (MAE). The optimal subset of features was analyzed by a random forest-based sequential forward feature selection method in the context of the top five heterogeneous regressors as base models to develop a stacked ensemble regressor, which was then optimized via the grid search method. Its predictions were explained by using local interpretable model-agnostic explanations (LIME) and partial dependence plots (PDPs). Results: A state-of-the-art stacking ensemble learning framework that integrates optimized extra trees, XGBoost, random forest, bagging, and gradient-boosting regressors was developed for nine selected features [i.e., daily dose (OLZ), gender_male, age, valproic acid_yes, ALT, K, BW, MONO#, and time of blood sampling after first administration]. It outperformed other base regressors that were considered, with an MAE of 0.064, R-square value of 0.5355, mean squared error of 0.0089, mean relative error of 13%, and ideal rate (the percentages of predicted TDM within ± 30% of actual TDM) of 63.40%. Predictions at the individual level were illustrated by LIME plots, whereas the global interpretation of associations between features and outcomes was illustrated by PDPs. Conclusion: This study highlights the feasibility of the real-time estimation of drug concentrations by using stacking-based ML strategies without losing interpretability, thus facilitating model-informed precision dosing.

Keywords: drug concentration; electronic health record; interpretability; machine learning; model-informed precision dosing; olanzapine; stacking; therapeutic drug monitoring.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest. The reviewer XM declared a shared parent affiliation with the authors at the time of review.

Figures

**FIGURE 2**
Proposed architecture of stacking with stratified five-fold cross-validation.

**FIGURE 3**
Frequency histograms **(A)** and Q–Q plots **(B)** of C (OLZ) and the log-transformed C (OLZ). **(C)** Chart of the matrix of missing data for 54 features, with fewer than 50% missing values in the original dataset.

**FIGURE 4**
**(A)** Evolution of prediction errors for various compositions of feature subsets selected by the random forest-based sequential forward feature selection strategy. The corresponding 95% CIs of the MAE obtained by 10-fold cross-validation are represented by the colored areas. **(B)** Relative feature importance of the top 10 features.

**FIGURE 5**
Heatmap of the Pearson correlations between the log-transformed C (OLZ) and the finally selected features.

**FIGURE 6**
Comparison of the prediction performance of our models on the validation cohorts under different conditions in terms of the MAE, R ², MSE, MRE, and IR.

**FIGURE 7**
Residuals plots: Plot of residuals versus the predicted values **(A)**, and normal plot of the residuals **(B)**.

**FIGURE 8**
Assessing the forecasting performance of the proposed stacking model in terms of different ranges of C (OLZ) on the validation cohort: Histograms of various metrics in the context of the low and intermediate-to-high ranges **(A)**, and a scatterplot of the relative error (RE)% versus the observed C (OLZ) in the intermediate-to-high range **(B)**, where the red dotted lines denote the MRE, the colored areas denote the ±30% (green color) and ± 50% (yellow color) ranges of the RE, and the dotes labeled by sample ID 1174 and ID 1570 represent the maximum RE of prediction and the maximum observed C (OLZ), respectively. Interpretation of the results of prediction of samples ID 1174 **(C)** and ID 1570 **(D)** by the LIME algorithm using different random_state values. The four views for each sample, from left to right, show the predicted values of the explanation and the stacking models, the feature coefficients (the orange and blue colors depict positive and negative relationships, respectively), the feature values in this sample, and the local explanation plot of these features.

**FIGURE 9**
**(A)** One-way PDPs for features included in the stacking model. **(B)** Two-way PDPs of the interactions between the daily dose (OLZ) and other features.

**FIGURE 10**
Illustration of a general framework of the self-learning and optimization processes of the ML model for a more precise, individualized dose of OLZ.

See this image and copyright information in PMC

Cited by

Ensemble Learning, Deep Learning-Based and Molecular Descriptor-Based Quantitative Structure-Activity Relationships.
Matsuzaka Y, Uesawa Y. Matsuzaka Y, et al. Molecules. 2023 Mar 6;28(5):2410. doi: 10.3390/molecules28052410. Molecules. 2023. PMID: 36903654 Free PMC article. Review.
Enhanced forecasting of emergency department patient arrivals using feature engineering approach and machine learning.
Porto BM, Fogliatto FS. Porto BM, et al. BMC Med Inform Decis Mak. 2024 Dec 18;24(1):377. doi: 10.1186/s12911-024-02788-6. BMC Med Inform Decis Mak. 2024. PMID: 39696224 Free PMC article.
MolToxPred: small molecule toxicity prediction using machine learning approach.
Setiya A, Jani V, Sonavane U, Joshi R. Setiya A, et al. RSC Adv. 2024 Jan 30;14(6):4201-4220. doi: 10.1039/d3ra07322j. eCollection 2024 Jan 23. RSC Adv. 2024. PMID: 38292268 Free PMC article.
Estimating the volume of penumbra in rodents using DTI and stack-based ensemble machine learning framework.
Kuo DP, Chen YC, Li YT, Cheng SJ, Hsieh KL, Kuo PC, Ou CY, Chen CY. Kuo DP, et al. Eur Radiol Exp. 2024 May 15;8(1):59. doi: 10.1186/s41747-024-00455-z. Eur Radiol Exp. 2024. PMID: 38744784 Free PMC article.
Optimizing PGRs for in vitro shoot proliferation of pomegranate with bayesian-tuned ensemble stacking regression and NSGA-II: a comparative evaluation of machine learning models.
Zarbakhsh S, Shahsavar AR, Soltani M. Zarbakhsh S, et al. Plant Methods. 2024 May 31;20(1):82. doi: 10.1186/s13007-024-01211-5. Plant Methods. 2024. PMID: 38822411 Free PMC article.

See all "Cited by" articles

References

1. Akter S., Shekhar H. U., Akhteruzzaman S. (2021). Application of biochemical tests and machine learning techniques to diagnose and evaluate liver disease. Adv. Biosci. Biotechnol. 12, 154–172. 10.4236/abb.2021.126011 - DOI
1. An H., Fan H., Chen S., Qi S., Ma B., Shi J., et al. (2021). Effects of dose, age, sex, body weight, and smoking on plasma concentrations of olanzapine and N-desmethyl olanzapine in inpatients with schizophrenia. J. Clin. Psychopharmacol. 41 (3), 255–259. 10.1097/JCP.0000000000001390 - DOI - PubMed
1. Arnaiz J. A., Rodrigues-Silva C., Mezquida G., Amoretti S., Cuesta M. J., Fraguas D., et al. (2021). The usefulness of olanzapine plasma concentrations in monitoring treatment efficacy and metabolic disturbances in first-episode psychosis. Psychopharmacol. (Berl) 238 (3), 665–676. 10.1007/s00213-020-05715-5 - DOI - PubMed
1. Beretta L., Santaniello A. (2016). Nearest neighbor imputation algorithms: A critical evaluation. BMC Med. Inf. Decis. Mak. 16, 74. 10.1186/s12911-016-0318-z - DOI - PMC - PubMed
1. Bigos K. L., Pollock B. G., Coley K. C., Miller D. D., Marder S. R., Aravagiri M., et al. (2008). Sex, race, and smoking impact olanzapine exposure. J. Clin. Pharmacol. 48 (2), 157–165. 10.1177/0091270007310385 - DOI - PubMed

LinkOut - more resources

Full Text Sources
Research Materials
- NCI CPTC Antibody Characterization Program

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

An interpretable stacking ensemble learning framework based on multi-dimensional data for real-time prediction of drug concentration: The example of olanzapine

Affiliations

An interpretable stacking ensemble learning framework based on multi-dimensional data for real-time prediction of drug concentration: The example of olanzapine

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

LinkOut - more resources

Full Text Sources

Research Materials