Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 May:6:55-63.
doi: 10.1016/j.comtox.2017.05.001. Epub 2017 May 13.

Performance of Machine Learning Algorithms for Qualitative and Quantitative Prediction Drug Blockade of hERG1 channel

Affiliations

Performance of Machine Learning Algorithms for Qualitative and Quantitative Prediction Drug Blockade of hERG1 channel

Soren Wacker et al. Comput Toxicol. 2018 May.

Abstract

Drug-induced abnormal heart rhythm known as Torsades de Pointes (TdP) is a potential lethal ventricular tachycardia found in many patients. Even newly released anti-arrhythmic drugs, like ivabradine with HCN channel as a primary target, block the hERG potassium current in overlapping concentration interval. Promiscuous drug block to hERG channel may potentially lead to perturbation of the action potential duration (APD) and TdP, especially when with combined with polypharmacy and/or electrolyte disturbances. The example of novel anti-arrhythmic ivabradine illustrates clinically important and ongoing deficit in drug design and warrants for better screening methods. There is an urgent need to develop new approaches for rapid and accurate assessment of how drugs with complex interactions and multiple subcellular targets can predispose or protect from drug-induced TdP. One of the unexpected outcomes of compulsory hERG screening implemented in USA and European Union resulted in large datasets of IC50 values for various molecules entering the market. The abundant data allows now to construct predictive machine-learning (ML) models. Novel ML algorithms and techniques promise better accuracy in determining IC50 values of hERG blockade that is comparable or surpassing that of the earlier QSAR or molecular modeling technique. To test the performance of modern ML techniques, we have developed a computational platform integrating various workflows for quantitative structure activity relationship (QSAR) models using data from the ChEMBL database. To establish predictive powers of ML-based algorithms we computed IC50 values for large dataset of molecules and compared it to automated patch clamp system for a large dataset of hERG blocking and non-blocking drugs, an industry gold standard in studies of cardiotoxicity. The optimal protocol with high sensitivity and predictive power is based on the novel eXtreme gradient boosting (XGBoost) algorithm. The ML-platform with XGBoost displays excellent performance with a coefficient of determination of up to R2 ~0.8 for pIC50 values in evaluation datasets, surpassing other metrics and approaches available in literature. Ultimately, the ML-based platform developed in our work is a scalable framework with automation potential to interact with other developing technologies in cardiotoxicity field, including high-throughput electrophysiology measurements delivering large datasets of profiled drugs, rapid synthesis and drug development via progress in synthetic biology.

Keywords: Drug Discovery; Drug-Induced Cardiotoxicity; Gradient-Boosting; Lead Optimization; Machine-Learning; Quantitative Structure Activity Relationship; hERG1 channel.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Machine Learning Platform Flowchart. For feature selection and model tuning only compounds in the training set were used. The compounds in the test sets were saved for the final evaluation of model performance. For model selection 10-fold cross-validation was used using the compounds in the training set. For the final model the complete training set was used.
Figure 2
Figure 2
A) Approximated density of pIC50 values in the Training and Test sets. B) Approximated density of the maximum similarities to compounds in the training set for all test sets. For the training set the similarity to the next most similar compound is shown. The curves are scaled so that the area under the curve is 1.
Figure 3
Figure 3
A) Q2 values from 10-fold cross-validation for the three feature sets FS1 (●), FS2 (▲), FS3 (■). B) Boxplots summarizing Q2 values. C–F) mean Q2 projected on the C) fraction of columns used to build each decision tree, D) the learning rate eta, E) the maximal depth of each tree and F) the size of the sample used for each tree. G) Training (■) and validation (●) RMSE over the size of the training set for different numbers of trees using the final parameters. Here fractions (between 0.2 and 1) of the training set were used to analyse the dependency of the predictive power on the size of the training set.
Figure 4
Figure 4
Model performance for all test sets combined. A) Correlation to experimental data. B) ROC curve using same class criteria. C) Error over the distance to the training set for each compound. Color codes illustrate the MST values (blue: MST > 0.5, red: MST < 0.5). Model performance for the combined test sets (D) and dependences on different similarity thresholds (E and F). G) Location of range violations. Number or range violations indicated by color and size of spheres. Cross-symbols indicate compounds with no range violations. H) and I) model performance for compounds within the recommended applicability domain.
Figure 5
Figure 5
The number of times features have been used in the model: A) the top 20 features and B) the following features. Molecular descriptors (blue) and pharmacophore features (black). Only scores of the top 150 features (of ~400) are shown.
Figure 6
Figure 6
Comparison of the final model with reference models. Only the range between -1 and 1 is shown. The value for the Lasso model for test set 4 (Test4) was - 1.22. The dotted line marks R2 = 0.6.

References

    1. Vandenberg JI, Perry MD, Perrin MJ, Mann SA, Ke Y, Hill AP. hERG K(+) Channels: Structure, Function, and Clinical Significance. Physiol Rev. 2012;92(3):1393–1478. - PubMed
    1. Witchel HJ. The hERG Potassium Channel as a Therapeutic Target. Expert Opin Ther Targets. 2007;11(3):321–336. - PubMed
    1. Witchel HJ. Drug-Induced hERG Block and Long QT Syndrome. Cardiovasc Ther. 2011;29(4):251–259. - PubMed
    1. Di Veroli GY, Davies MR, Zhang H, Abi-Gerges N, Boyett MR. High-Throughput Screening of Drug-Binding Dynamics to HERG Improves Early Drug Safety Assessment. Am J Physiol Heart Circ Physiol. 2013;304(1):H104–H117. - PubMed
    1. Cocco G, Jerie P. Torsades de Pointes Induced by the Concomitant Use of Ivabradine and Azithromycin: An Unexpected Dangerous Interaction. Cardiovasc Toxicol. 2015;15(1):104–106. - PubMed

LinkOut - more resources