Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Sep 29:8:51.
doi: 10.1186/s13321-016-0162-2. eCollection 2016.

Computational methods for prediction of in vitro effects of new chemical structures

Affiliations

Computational methods for prediction of in vitro effects of new chemical structures

Priyanka Banerjee et al. J Cheminform. .

Abstract

Background: With a constant increase in the number of new chemicals synthesized every year, it becomes important to employ the most reliable and fast in silico screening methods to predict their safety and activity profiles. In recent years, in silico prediction methods received great attention in an attempt to reduce animal experiments for the evaluation of various toxicological endpoints, complementing the theme of replace, reduce and refine. Various computational approaches have been proposed for the prediction of compound toxicity ranging from quantitative structure activity relationship modeling to molecular similarity-based methods and machine learning. Within the "Toxicology in the 21st Century" screening initiative, a crowd-sourcing platform was established for the development and validation of computational models to predict the interference of chemical compounds with nuclear receptor and stress response pathways based on a training set containing more than 10,000 compounds tested in high-throughput screening assays.

Results: Here, we present the results of various molecular similarity-based and machine-learning based methods over an independent evaluation set containing 647 compounds as provided by the Tox21 Data Challenge 2014. It was observed that the Random Forest approach based on MACCS molecular fingerprints and a subset of 13 molecular descriptors selected based on statistical and literature analysis performed best in terms of the area under the receiver operating characteristic curve values. Further, we compared the individual and combined performance of different methods. In retrospect, we also discuss the reasons behind the superior performance of an ensemble approach, combining a similarity search method with the Random Forest algorithm, compared to individual methods while explaining the intrinsic limitations of the latter.

Conclusions: Our results suggest that, although prediction methods were optimized individually for each modelled target, an ensemble of similarity and machine-learning approaches provides promising performance indicating its broad applicability in toxicity prediction.

Keywords: Machine learning; Molecular fingerprints; Similarity searching; Tox21 challenge; Toxicity prediction.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Workflow of the methodology involved in the classification process. Schematic representation of the methodology: data points, feature selection, model development (machine learning and similarity search methods) and validation, implemented in the study
Fig. 2
Fig. 2
Cross-validation performance results of classifiers. Plot representing the 13-fold cross-validation results, in terms of AUC, for the three targets (AhR, ER-LBD and HSE) comparing different best performing models (3NN, 5NN, 7NN, RF, NB, and PNN) [28]
Fig. 3
Fig. 3
External validation performance results of classifiers. Plot representing the external validation results, in terms of AUC, for the three targets (AhR, ER-LBD and HSE) comparing different best performing models (3NN, 5NN, 7NN, RF, NB, PNN, Ensemble (5NN + RF)) with our previous work [28] and Tox21 challenge winners for respective targets
Fig. 4
Fig. 4
Analysis of chemical space used by descriptors for classification of actives in external sets for ER-LBD target. The above figure shows the different actives present in the external set of ER-LBD. The compounds highlighted in pink (MACCS), green (ECFP4) are predicted by RF model and blue (ECFP4), red (MACCS) are predicted by NB models. The respective prediction scores for each classifier are shown in Table 2
Fig. 5
Fig. 5
Two-dimensional structures of actives and inactives in the training set for ER-LBD target. A set of training set compounds which are active (1) and inactive (0) against ER-LBD

References

    1. Schmid EF, Smith DA. Keynote review: is declining innovation in the pharmaceutical industry a myth? Drug Discov Today. 2005;10:1031–1039. doi: 10.1016/S1359-6446(05)03524-5. - DOI - PubMed
    1. Swinney DC, Anthony J. How were new medicines discovered? Nat Rev Drug Discov. 2011;10:507–519. doi: 10.1038/nrd3480. - DOI - PubMed
    1. Maziasz T, Kadambi VJ, Silverman L, Fedyk E, Alden CL. Predictive toxicology approaches for small molecule oncology drugs. Toxicol Pathol. 2010;38:148–164. doi: 10.1177/0192623309356448. - DOI - PubMed
    1. Wang Y, Xing J, Xu Y, Zhou N, Peng J, Xiong Z, Liu X, Luo X, Luo C, Chen K, Zheng M, Jiang H. In silico ADME/T modelling for rational drug design. Q Rev Biophys. 2015;48:488–515. doi: 10.1017/S0033583515000190. - DOI - PubMed
    1. Vedani A, Smiesko M. In silico toxicology in drug discovery—concepts based on three-dimensional models. Altern Lab Anim ATLA. 2009;37:477–496. - PubMed