Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Sep 27:13:975855.
doi: 10.3389/fphar.2022.975855. eCollection 2022.

An interpretable stacking ensemble learning framework based on multi-dimensional data for real-time prediction of drug concentration: The example of olanzapine

Affiliations

An interpretable stacking ensemble learning framework based on multi-dimensional data for real-time prediction of drug concentration: The example of olanzapine

Xiuqing Zhu et al. Front Pharmacol. .

Abstract

Background and Aim: Therapeutic drug monitoring (TDM) has evolved over the years as an important tool for personalized medicine. Nevertheless, some limitations are associated with traditional TDM. Emerging data-driven model forecasting [e.g., through machine learning (ML)-based approaches] has been used for individualized therapy. This study proposes an interpretable stacking-based ML framework to predict concentrations in real time after olanzapine (OLZ) treatment. Methods: The TDM-OLZ dataset, consisting of 2,142 OLZ measurements and 472 features, was formed by collecting electronic health records during the TDM of 927 patients who had received OLZ treatment. We compared the performance of ML algorithms by using 10-fold cross-validation and the mean absolute error (MAE). The optimal subset of features was analyzed by a random forest-based sequential forward feature selection method in the context of the top five heterogeneous regressors as base models to develop a stacked ensemble regressor, which was then optimized via the grid search method. Its predictions were explained by using local interpretable model-agnostic explanations (LIME) and partial dependence plots (PDPs). Results: A state-of-the-art stacking ensemble learning framework that integrates optimized extra trees, XGBoost, random forest, bagging, and gradient-boosting regressors was developed for nine selected features [i.e., daily dose (OLZ), gender_male, age, valproic acid_yes, ALT, K, BW, MONO#, and time of blood sampling after first administration]. It outperformed other base regressors that were considered, with an MAE of 0.064, R-square value of 0.5355, mean squared error of 0.0089, mean relative error of 13%, and ideal rate (the percentages of predicted TDM within ± 30% of actual TDM) of 63.40%. Predictions at the individual level were illustrated by LIME plots, whereas the global interpretation of associations between features and outcomes was illustrated by PDPs. Conclusion: This study highlights the feasibility of the real-time estimation of drug concentrations by using stacking-based ML strategies without losing interpretability, thus facilitating model-informed precision dosing.

Keywords: drug concentration; electronic health record; interpretability; machine learning; model-informed precision dosing; olanzapine; stacking; therapeutic drug monitoring.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest. The reviewer XM declared a shared parent affiliation with the authors at the time of review.

Figures

FIGURE 1
FIGURE 1
Flowchart of this study.
FIGURE 2
FIGURE 2
Proposed architecture of stacking with stratified five-fold cross-validation.
FIGURE 3
FIGURE 3
Frequency histograms (A) and Q–Q plots (B) of C (OLZ) and the log-transformed C (OLZ). (C) Chart of the matrix of missing data for 54 features, with fewer than 50% missing values in the original dataset.
FIGURE 4
FIGURE 4
(A) Evolution of prediction errors for various compositions of feature subsets selected by the random forest-based sequential forward feature selection strategy. The corresponding 95% CIs of the MAE obtained by 10-fold cross-validation are represented by the colored areas. (B) Relative feature importance of the top 10 features.
FIGURE 5
FIGURE 5
Heatmap of the Pearson correlations between the log-transformed C (OLZ) and the finally selected features.
FIGURE 6
FIGURE 6
Comparison of the prediction performance of our models on the validation cohorts under different conditions in terms of the MAE, R 2, MSE, MRE, and IR.
FIGURE 7
FIGURE 7
Residuals plots: Plot of residuals versus the predicted values (A), and normal plot of the residuals (B).
FIGURE 8
FIGURE 8
Assessing the forecasting performance of the proposed stacking model in terms of different ranges of C (OLZ) on the validation cohort: Histograms of various metrics in the context of the low and intermediate-to-high ranges (A), and a scatterplot of the relative error (RE)% versus the observed C (OLZ) in the intermediate-to-high range (B), where the red dotted lines denote the MRE, the colored areas denote the ±30% (green color) and ± 50% (yellow color) ranges of the RE, and the dotes labeled by sample ID 1174 and ID 1570 represent the maximum RE of prediction and the maximum observed C (OLZ), respectively. Interpretation of the results of prediction of samples ID 1174 (C) and ID 1570 (D) by the LIME algorithm using different random_state values. The four views for each sample, from left to right, show the predicted values of the explanation and the stacking models, the feature coefficients (the orange and blue colors depict positive and negative relationships, respectively), the feature values in this sample, and the local explanation plot of these features.
FIGURE 9
FIGURE 9
(A) One-way PDPs for features included in the stacking model. (B) Two-way PDPs of the interactions between the daily dose (OLZ) and other features.
FIGURE 10
FIGURE 10
Illustration of a general framework of the self-learning and optimization processes of the ML model for a more precise, individualized dose of OLZ.

Similar articles

Cited by

References

    1. Akter S., Shekhar H. U., Akhteruzzaman S. (2021). Application of biochemical tests and machine learning techniques to diagnose and evaluate liver disease. Adv. Biosci. Biotechnol. 12, 154–172. 10.4236/abb.2021.126011 - DOI
    1. An H., Fan H., Chen S., Qi S., Ma B., Shi J., et al. (2021). Effects of dose, age, sex, body weight, and smoking on plasma concentrations of olanzapine and N-desmethyl olanzapine in inpatients with schizophrenia. J. Clin. Psychopharmacol. 41 (3), 255–259. 10.1097/JCP.0000000000001390 - DOI - PubMed
    1. Arnaiz J. A., Rodrigues-Silva C., Mezquida G., Amoretti S., Cuesta M. J., Fraguas D., et al. (2021). The usefulness of olanzapine plasma concentrations in monitoring treatment efficacy and metabolic disturbances in first-episode psychosis. Psychopharmacol. (Berl) 238 (3), 665–676. 10.1007/s00213-020-05715-5 - DOI - PubMed
    1. Beretta L., Santaniello A. (2016). Nearest neighbor imputation algorithms: A critical evaluation. BMC Med. Inf. Decis. Mak. 16, 74. 10.1186/s12911-016-0318-z - DOI - PMC - PubMed
    1. Bigos K. L., Pollock B. G., Coley K. C., Miller D. D., Marder S. R., Aravagiri M., et al. (2008). Sex, race, and smoking impact olanzapine exposure. J. Clin. Pharmacol. 48 (2), 157–165. 10.1177/0091270007310385 - DOI - PubMed