This is a preprint.

It has not yet been peer reviewed by a journal.

The National Library of Medicine is running a pilot to include preprints that result from research funded by NIH in PMC and PubMed.

[Preprint]. 2025 Jul 10:rs.3.rs-6999821.

doi: 10.21203/rs.3.rs-6999821/v1.

Capturing Unanticipated Drug Toxicities Using an Ensemble Machine Learning Approach

Nicole Zatorski¹, Avner Schlessinger²

Affiliations

PMID: 40671799
PMCID: PMC12265155
DOI: 10.21203/rs.3.rs-6999821/v1

Capturing Unanticipated Drug Toxicities Using an Ensemble Machine Learning Approach

Nicole Zatorski et al. Res Sq. 2025.

[Preprint]. 2025 Jul 10:rs.3.rs-6999821.

doi: 10.21203/rs.3.rs-6999821/v1.

Authors

Nicole Zatorski¹, Avner Schlessinger²

Affiliations

¹ Duke University Hospital.
² Icahn School of Medicine at Mount Sinai.

PMID: 40671799
PMCID: PMC12265155
DOI: 10.21203/rs.3.rs-6999821/v1

Abstract

Despite rigorous safety evaluations during development, numerous drugs have been withdrawn from the market due to serious toxicities. Here we investigate the features found in drugs with these unanticipated toxicities and apply a machine learning approach to predict if a drug is likely to be withdrawn due to intolerable side effects without the need for human trial data. Our best preforming classifier was an ensemble predictor trained on protein targets, protein structure features, chemical fingerprints, and chemical features that achieved 92% accuracy and 0.845 Matthews Correlation Coefficient with 10-fold holdout test set cross validation. Analysis of features predictive of unanticipated toxicity revealed both known factors such as inhibition of cytochrome P450 as well as yet uninvestigated factors including the inhibition of bile salt export pumps. This predictor and subsequent feature analysis pave the way for the larger role of computational methods in screening potential candidates during drug development.

PubMed Disclaimer

Conflict of interest statement

COMPETING INTERESTS The authors have no conflicts of interest to declare. Additional Declarations: No competing interests reported.

Figures

**Figure 1. Causes of drug withdrawals listed by organ system of impact.**
Three broad toxicity types, including cardiovascular, hepatic, and neuropsychiatric comprise a majority of reasons for drug withdrawals. Data was compiled from the ChEMBL database (Gaulton, et al., 2017). A single drug can have more than one potential cause for withdrawal.

**Figure 2. Indication of withdrawn drugs compared to indications of approved drugs.**
Indication are defined based on the first letter (Level 1) of the ATC code. Percent of compounds with that ATC first letter designation compared to the total number of compounds in that category is shown on the horizontal axis. Pairwise comparisons between frequencies of ATC indications performed with a fisher exact test show the following indications have differences between withdrawn and not withdrawn drugs: anti-infective, antineoplastic/immune, antiparasitic, hormones, musculo-skeletal and nervous system. A p-value lower than 0.05 is indicated by a *, and a p-value lower than the Bonferroni corrected significance level is indicated by a **.

**Figure 3. Drug chemical features of withdrawn drugs compared to not withdrawn drugs.**
The average values of continuous variables and the frequency counts normalized by the total number of drugs in each category for categorical variables were calculated for withdrawn and not withdrawn drugs. For each statistically significantly different chemical feature variable, the log base 2 value of the ratio of the withdrawn drug divided by the not withdrawn drug was plotted. Features with positive values are larger in withdrawn drugs than not withdrawn drugs. All features shown have a p-value smaller than 0.05. Feature names correspond to those found in ChEMBL chembl_webresource_client. Features that are over represented in withdrawn drugs include lipophilic characteristics (cx_logd, alogp and cx_logp, which correspond to logD and logP respectively) as well as number of aromatic rings. Features that are under represented in withdrawn drugs include the number of atoms participating in hydrogen bonds (hydrogen bond acceptors and donors- hba, hbd, hba_lipinski, hbd_lipinski), the molecular weight (mw_freebase, mw_monoisotopic, full_mwt), polar surface area (psa), status as a prodrug, and routes of administration (topical, parenteral, and oral administration).

**Figure 4. Model predicting drug withdrawal performance on representative test set.**
Top-performing classifiers are shown for each data set, and are as follows: multilayer perceptron for drug target features, RBF sampler for drug targets, extra trees classifier stacked with a decision tree for drug chemical features, random forest for drug fingerprints, and K-nearest neighbor for the ensemble. A. Model architecture for the withdrawn drug predictor. Each data set was divided into training, test, and validation sets. The training set was used to select and tune the hyperparameters for separate models. The test set was used to evaluate model performance. Predictions from these models were combined to form a new data set which served as the training data for the meta predictor. The ensemble predictor was trained on the pooled predictions from the individual predictors using the original training set and the test set combined. The model was evaluated using validation set. B. Receiver Operating Characteristic (ROC) curve showing predictive model true positive rate compared to false positive rate for each of the models based on different input data as well as the overall ensemble predictor. C. Precision-recall curves for each of the models based on different input data as well as the overall ensemble predictor. Using the two different evaluation measures, the ensemble-based predictor preformed the best of all models.

**Figure 5. Prediction of withdrawal of drugs in clinical trials.**
The predictor was trained on 10 different balanced random subsets of existing withdrawn and not withdrawn drug data. These independently trained models were applied to new drugs currently in clinical trials to assess future withdrawal status of these candidate compounds. This figure shows structures of compounds that are currently in clinical trials and are predicted to be withdrawn by the predictor in 10 out of 10 prediction iterations. These include three compounds with serious known toxicities and toxic targets.

See this image and copyright information in PMC

References

1. Wysowski D. K. & Swartz L. Adverse drug event surveillance and drug withdrawals in the United States, 1969–2002: the importance of reporting suspected reactions. Arch Intern Med 165, 1363–1369 (2005). 10.1001/archinte.165.12.1363 - DOI - PubMed
1. McNaughton R., Huet G. & Shakir S. An investigation into drug products withdrawn from the EU market between 2002 and 2011 for safety reasons and the evidence used to support the decision-making. BMJ Open 4, e004221 (2014). 10.1136/bmjopen-2013-004221 - DOI - PMC - PubMed
1. Ishiguro C., Misu T., Iwasa E. & Izawa T. Analysis of safety-related regulatory actions by Japan’s pharmaceutical regulatory agency. Pharmacoepidemiol Drug Saf 26, 1314–1320 (2017). 10.1002/pds.4252 - DOI - PubMed
1. Alves C., Macedo A. F. & Marques F. B. Sources of information used by regulatory agencies on the generation of drug safety alerts. Eur J Clin Pharmacol 69, 2083–2094 (2013). 10.1007/s00228-013-1564-y - DOI - PubMed
1. Eichler H. G. et al. Relative efficacy of drugs: an emerging issue between regulatory agencies and third-party payers. Nat Rev Drug Discov 9, 277–291 (2010). 10.1038/nrd3079 - DOI - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

This is a preprint.

Capturing Unanticipated Drug Toxicities Using an Ensemble Machine Learning Approach

Affiliations

Capturing Unanticipated Drug Toxicities Using an Ensemble Machine Learning Approach

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

Grants and funding

LinkOut - more resources

Full Text Sources