Computational methods for prediction of in vitro effects of new chemical structures

Priyanka Banerjee¹, Vishal B Siramshetty², Malgorzata N Drwal³, Robert Preissner⁴

Affiliations

¹ Structural Bioinformatics Group, Institute for Physiology, Charité - University Medicine Berlin, Berlin, Germany ; Graduate School of Computational Systems Biology, Humboldt University of Berlin, Berlin, Germany.
² Structural Bioinformatics Group, Experimental and Clinical Research Center (ECRC), Charité - University Medicine Berlin, Berlin, Germany ; BB3R - Berlin Brandenburg 3R Graduate School, Free University of Berlin, Berlin, Germany.
³ Structural Bioinformatics Group, Institute for Physiology, Charité - University Medicine Berlin, Berlin, Germany ; Laboratoire d'innovation thérapeutique, Université de Strasbourg, Illkirch, France.
⁴ Structural Bioinformatics Group, Institute for Physiology, Charité - University Medicine Berlin, Berlin, Germany ; Structural Bioinformatics Group, Experimental and Clinical Research Center (ECRC), Charité - University Medicine Berlin, Berlin, Germany ; BB3R - Berlin Brandenburg 3R Graduate School, Free University of Berlin, Berlin, Germany.

PMID: 28316649
PMCID: PMC5043617
DOI: 10.1186/s13321-016-0162-2

Computational methods for prediction of in vitro effects of new chemical structures

Priyanka Banerjee et al. J Cheminform. 2016.

. 2016 Sep 29:8:51.

doi: 10.1186/s13321-016-0162-2. eCollection 2016.

Authors

Priyanka Banerjee¹, Vishal B Siramshetty², Malgorzata N Drwal³, Robert Preissner⁴

Affiliations

¹ Structural Bioinformatics Group, Institute for Physiology, Charité - University Medicine Berlin, Berlin, Germany ; Graduate School of Computational Systems Biology, Humboldt University of Berlin, Berlin, Germany.
² Structural Bioinformatics Group, Experimental and Clinical Research Center (ECRC), Charité - University Medicine Berlin, Berlin, Germany ; BB3R - Berlin Brandenburg 3R Graduate School, Free University of Berlin, Berlin, Germany.
³ Structural Bioinformatics Group, Institute for Physiology, Charité - University Medicine Berlin, Berlin, Germany ; Laboratoire d'innovation thérapeutique, Université de Strasbourg, Illkirch, France.
⁴ Structural Bioinformatics Group, Institute for Physiology, Charité - University Medicine Berlin, Berlin, Germany ; Structural Bioinformatics Group, Experimental and Clinical Research Center (ECRC), Charité - University Medicine Berlin, Berlin, Germany ; BB3R - Berlin Brandenburg 3R Graduate School, Free University of Berlin, Berlin, Germany.

PMID: 28316649
PMCID: PMC5043617
DOI: 10.1186/s13321-016-0162-2

Abstract

Background: With a constant increase in the number of new chemicals synthesized every year, it becomes important to employ the most reliable and fast in silico screening methods to predict their safety and activity profiles. In recent years, in silico prediction methods received great attention in an attempt to reduce animal experiments for the evaluation of various toxicological endpoints, complementing the theme of replace, reduce and refine. Various computational approaches have been proposed for the prediction of compound toxicity ranging from quantitative structure activity relationship modeling to molecular similarity-based methods and machine learning. Within the "Toxicology in the 21st Century" screening initiative, a crowd-sourcing platform was established for the development and validation of computational models to predict the interference of chemical compounds with nuclear receptor and stress response pathways based on a training set containing more than 10,000 compounds tested in high-throughput screening assays.

Results: Here, we present the results of various molecular similarity-based and machine-learning based methods over an independent evaluation set containing 647 compounds as provided by the Tox21 Data Challenge 2014. It was observed that the Random Forest approach based on MACCS molecular fingerprints and a subset of 13 molecular descriptors selected based on statistical and literature analysis performed best in terms of the area under the receiver operating characteristic curve values. Further, we compared the individual and combined performance of different methods. In retrospect, we also discuss the reasons behind the superior performance of an ensemble approach, combining a similarity search method with the Random Forest algorithm, compared to individual methods while explaining the intrinsic limitations of the latter.

Conclusions: Our results suggest that, although prediction methods were optimized individually for each modelled target, an ensemble of similarity and machine-learning approaches provides promising performance indicating its broad applicability in toxicity prediction.

Keywords: Machine learning; Molecular fingerprints; Similarity searching; Tox21 challenge; Toxicity prediction.

PubMed Disclaimer

Figures

**Fig. 1**
Workflow of the methodology involved in the classification process. Schematic representation of the methodology: data points, feature selection, model development (machine learning and similarity search methods) and validation, implemented in the study

**Fig. 2**
Cross-validation performance results of classifiers. Plot representing the 13-fold cross-validation results, in terms of AUC, for the three targets (AhR, ER-LBD and HSE) comparing different best performing models (3NN, 5NN, 7NN, RF, NB, and PNN) [28]

**Fig. 3**
External validation performance results of classifiers. Plot representing the external validation results, in terms of AUC, for the three targets (AhR, ER-LBD and HSE) comparing different best performing models (3NN, 5NN, 7NN, RF, NB, PNN, Ensemble (5NN + RF)) with our previous work [28] and Tox21 challenge winners for respective targets

**Fig. 4**
Analysis of chemical space used by descriptors for classification of actives in external sets for ER-LBD target. The above figure shows the different actives present in the external set of ER-LBD. The compounds highlighted in *pink* (MACCS), *green* (ECFP4) are predicted by RF model and *blue* (ECFP4), *red* (MACCS) are predicted by NB models. The respective prediction scores for each classifier are shown in Table 2

**Fig. 5**
Two-dimensional structures of actives and inactives in the training set for ER-LBD target. A set of training set compounds which are active (1) and inactive (0) against ER-LBD

See this image and copyright information in PMC

References

1. Schmid EF, Smith DA. Keynote review: is declining innovation in the pharmaceutical industry a myth? Drug Discov Today. 2005;10:1031–1039. doi: 10.1016/S1359-6446(05)03524-5. - DOI - PubMed
1. Swinney DC, Anthony J. How were new medicines discovered? Nat Rev Drug Discov. 2011;10:507–519. doi: 10.1038/nrd3480. - DOI - PubMed
1. Maziasz T, Kadambi VJ, Silverman L, Fedyk E, Alden CL. Predictive toxicology approaches for small molecule oncology drugs. Toxicol Pathol. 2010;38:148–164. doi: 10.1177/0192623309356448. - DOI - PubMed
1. Wang Y, Xing J, Xu Y, Zhou N, Peng J, Xiong Z, Liu X, Luo X, Luo C, Chen K, Zheng M, Jiang H. In silico ADME/T modelling for rational drug design. Q Rev Biophys. 2015;48:488–515. doi: 10.1017/S0033583515000190. - DOI - PubMed
1. Vedani A, Smiesko M. In silico toxicology in drug discovery—concepts based on three-dimensional models. Altern Lab Anim ATLA. 2009;37:477–496. - PubMed

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Computational methods for prediction of in vitro effects of new chemical structures

Affiliations

Computational methods for prediction of in vitro effects of new chemical structures

Authors

Affiliations

Abstract

Figures

References

LinkOut - more resources

Full Text Sources

Other Literature Sources