. 2017 Nov 20;30(11):2046-2059.

doi: 10.1021/acs.chemrestox.7b00084. Epub 2017 Oct 9.

Predicting Organ Toxicity Using in Vitro Bioactivity Data and Chemical Structure

Jie Liu^{1

2}, Grace Patlewicz³, Antony J Williams³, Russell S Thomas³, Imran Shah³

Affiliations

¹ Department of Information Science, University of Arkansas at Little Rock , Arkansas 72204, United States.
² Oak Ridge Institute for Science Education, National Center for Computational Toxicology, Office of Research and Development, U.S. Environmental Protection Agency , Research Triangle Park, Durham, North Carolina 27711, United States.
³ National Center for Computational Toxicology, Office of Research and Development, U.S. Environmental Protection Agency , Research Triangle Park, Durham, North Carolina 27711, United States.

PMID: 28768096
PMCID: PMC6172960
DOI: 10.1021/acs.chemrestox.7b00084

Predicting Organ Toxicity Using in Vitro Bioactivity Data and Chemical Structure

Jie Liu et al. Chem Res Toxicol. 2017.

. 2017 Nov 20;30(11):2046-2059.

doi: 10.1021/acs.chemrestox.7b00084. Epub 2017 Oct 9.

Authors

Jie Liu^{1

2}, Grace Patlewicz³, Antony J Williams³, Russell S Thomas³, Imran Shah³

Affiliations

¹ Department of Information Science, University of Arkansas at Little Rock , Arkansas 72204, United States.
² Oak Ridge Institute for Science Education, National Center for Computational Toxicology, Office of Research and Development, U.S. Environmental Protection Agency , Research Triangle Park, Durham, North Carolina 27711, United States.
³ National Center for Computational Toxicology, Office of Research and Development, U.S. Environmental Protection Agency , Research Triangle Park, Durham, North Carolina 27711, United States.

PMID: 28768096
PMCID: PMC6172960
DOI: 10.1021/acs.chemrestox.7b00084

Abstract

Animal testing alone cannot practically evaluate the health hazard posed by tens of thousands of environmental chemicals. Computational approaches making use of high-throughput experimental data may provide more efficient means to predict chemical toxicity. Here, we use a supervised machine learning strategy to systematically investigate the relative importance of study type, machine learning algorithm, and type of descriptor on predicting in vivo repeat-dose toxicity at the organ-level. A total of 985 compounds were represented using chemical structural descriptors, ToxPrint chemotype descriptors, and bioactivity descriptors from ToxCast in vitro high-throughput screening assays. Using ToxRefDB, a total of 35 target organ outcomes were identified that contained at least 100 chemicals (50 positive and 50 negative). Supervised machine learning was performed using Naïve Bayes, k-nearest neighbor, random forest, classification and regression trees, and support vector classification approaches. Model performance was assessed based on F1 scores using 5-fold cross-validation with balanced bootstrap replicates. Fixed effects modeling showed the variance in F1 scores was explained mostly by target organ outcome, followed by descriptor type, machine learning algorithm, and interactions between these three factors. A combination of bioactivity and chemical structure or chemotype descriptors were the most predictive. Model performance improved with more chemicals (up to a maximum of 24%), and these gains were correlated (ρ = 0.92) with the number of chemicals. Overall, the results demonstrate that a combination of bioactivity and chemical descriptors can accurately predict a range of target organ toxicity outcomes in repeat-dose studies, but specific experimental and methodologic improvements may increase predictivity.

PubMed Disclaimer

Figures

**Figure 1.**
Distribution of positive and negative chemicals across the *in vivo* guideline toxicity testing studies and target organs. From left to right these bar graphs show the number of positive (pos, red) and negative (neg, green) chemicals for chronic (CHR), subchronic (SUB), multigenerational (MGR) and developmental (DEV) studies. The target organs are labeled on the ordinate and the number of chemicals on the abscissa. The negative chemicals are missing for guideline studies where the evaluation of the specific target organ effect was not compulsory.

**Figure 2.**
Relationship between F1 score and number of descriptors for the best performing classification models and illustrative examples of minimal datasets. In each graph, the effect and descriptor type are given in the title (denoted as study:target-organ), the mean F1 score, and the standard deviation is shown in blue and gray, respectively. The number of descriptors and F1 score for the best classifier are signified on the ordinate and abscissa, respectively, by vertical and horizontal red lines. Each graph shows the cross-validation F1 score (ordinate) and number of descriptors (abscissa) for predicting toxicities (shown in the title and denoted as study:target-organ) using classification methods (shown in title)

**Figure 3.**
Summary of performance for target organ outcomes for select minimum datasets by classification algorithms and descriptors. The visualization shows the predictive performance for illustrative examples of target organ outcomes in rows (denoted as, study:target organ) using eight machine learning algorithms (columns): naïve Bayes (NB), k-nearest neighbor classification (KNN0 and KNN1) classification and regression trees (CART0 and CART1) and support vector classifiers (SVCL0 and SVCR0). The predictive performance is compared across five different descriptors including: chemical (chm), chemotype (ct), *in vitro* bioactivity (bio), a combination of *in vitro* bioactivity and chemical (bc), and a combination of *in vitro* bioactivity and chemotype (ct). The performance of a classification method for predicting an outcomeusing a descriptor type was measured using specificity (green), F1 score (red) and sensitivity (blue), which are visualized as vertical glyphs. The center, top, and bottom of the glyphs correspond to the mean ±1 SD. In all, the performance results for 40 classification methods (8 machine learning algorithms and five descriptor types) are visualized for each target organ toxicity. The grey horizontal bars on each graph signify the best mean F1 score ±1 SD (across all 5-fold cross-validation trials). The best performing classification model and descriptor set for each target organ outcome are denoted with a vertical red line.

**Figure 4.**
Summary of frequently used bioactivity descriptors in chronic target organ toxicity prediction models. The visualization shows a heat map in which the rows correspond to chronic target organ toxicities, columns correspond to the fifty most frequently used bioactivity descriptors, and values represent row standardized frequencies of occurrence of descriptors (column) in predictive models of target organ toxicities (row). The colors signify the row standardized frequencies for the bioactivity descriptors where positive values are red, negative values are blue and the level of saturation is directly related to magnitude. The row dendrogram show the cosine similarity between the frequency of bioactivity descriptors and target organ toxicity outcomes, respectively, by average linkage clustering.

**Figure 5.**
Summary of frequently used chemotype descriptors in chronic target organ toxicity prediction models. The visualization shows a heat map in which the rows correspond to chronic target organ toxicities, columns correspond to the fifty most frequently used chemotype descriptors, and values represent row standardized frequencies of occurrence of descriptors (column) in predictive models of target organ toxicities (row). The colors signify the row standardized frequencies for the chemotype descriptors where positive values are red, negative values are blue and the level of saturation is directly related to magnitude. The row dendrogram shows the cosine similarity between the frequency of chemotype descriptors and target organ toxicity outcomes, respectively, by average linkage clustering.

See this image and copyright information in PMC

Cited by

ChemBioSim: Enhancing Conformal Prediction of In Vivo Toxicity by Use of Predicted Bioactivities.
Garcia de Lomana M, Morger A, Norinder U, Buesen R, Landsiedel R, Volkamer A, Kirchmair J, Mathea M. Garcia de Lomana M, et al. J Chem Inf Model. 2021 Jul 26;61(7):3255-3272. doi: 10.1021/acs.jcim.1c00451. Epub 2021 Jun 21. J Chem Inf Model. 2021. PMID: 34153183 Free PMC article.
Advances in computational methods along the exposure to toxicological response paradigm.
El-Masri H, Paul Friedman K, Isaacs K, Wetmore BA. El-Masri H, et al. Toxicol Appl Pharmacol. 2022 Sep 1;450:116141. doi: 10.1016/j.taap.2022.116141. Epub 2022 Jun 29. Toxicol Appl Pharmacol. 2022. PMID: 35777528 Free PMC article.
Machine Learning Models for Predicting Liver Toxicity.
Liu J, Guo W, Sakkiah S, Ji Z, Yavas G, Zou W, Chen M, Tong W, Patterson TA, Hong H. Liu J, et al. Methods Mol Biol. 2022;2425:393-415. doi: 10.1007/978-1-0716-1960-5_15. Methods Mol Biol. 2022. PMID: 35188640
Accurate clinical toxicity prediction using multi-task deep neural nets and contrastive molecular explanations.
Sharma B, Chenthamarakshan V, Dhurandhar A, Pereira S, Hendler JA, Dordick JS, Das P. Sharma B, et al. Sci Rep. 2023 Mar 25;13(1):4908. doi: 10.1038/s41598-023-31169-8. Sci Rep. 2023. PMID: 36966203 Free PMC article.
KnowTox: pipeline and case study for confident prediction of potential toxic effects of compounds in early phases of development.
Morger A, Mathea M, Achenbach JH, Wolf A, Buesen R, Schleifer KJ, Landsiedel R, Volkamer A. Morger A, et al. J Cheminform. 2020 Apr 14;12(1):24. doi: 10.1186/s13321-020-00422-x. J Cheminform. 2020. PMID: 33431007 Free PMC article.

See all "Cited by" articles

References

1. Wagner K, Fach B, and Kolar R (2012) Inconsistencies in data requirements of EU legislation involving tests on animals. ALTEX 29, 302–332. - PubMed
1. Everts S (2009) Cost Of REACH Underestimated. Chemical and Engineering News 87, 7.
1. EPA. (2016) TSCA Chemical Substance Inventory, U.S. Environmental Protection Agency.
1. ECHA. (2016) European Chemicals Agency (ECHA): Pre-registered Substances. Pre-registered substances - ECHA.
1. Rovida C, and Hartung T (2009) Re-evaluation of animal numbers and costs for in vivo tests to accomplish REACH legislation requirements for chemicals - a report by the transatlantic think tank for toxicology (t(4)). ALTEX 26, 187–208. - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

Grants and funding

EPA999999/ImEPA/Intramural EPA/United States

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Predicting Organ Toxicity Using in Vitro Bioactivity Data and Chemical Structure

Affiliations

Predicting Organ Toxicity Using in Vitro Bioactivity Data and Chemical Structure

Authors

Affiliations

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources