. 2020 Feb;128(2):27002.

doi: 10.1289/EHP5580. Epub 2020 Feb 7.

CoMPARA: Collaborative Modeling Project for Androgen Receptor Activity

Kamel Mansouri^{1

2

3}, Nicole Kleinstreuer⁴, Ahmed M Abdelaziz⁵, Domenico Alberga⁶, Vinicius M Alves^{7

8}, Patrik L Andersson⁹, Carolina H Andrade⁷, Fang Bai¹⁰, Ilya Balabin¹¹, Davide Ballabio¹², Emilio Benfenati¹³, Barun Bhhatarai¹⁴, Scott Boyer¹⁵, Jingwen Chen¹⁶, Viviana Consonni¹², Sherif Farag⁸, Denis Fourches¹⁷, Alfonso T García-Sosa¹⁸, Paola Gramatica¹⁴, Francesca Grisoni¹², Chris M Grulke¹, Huixiao Hong¹⁹, Dragos Horvath²⁰, Xin Hu²¹, Ruili Huang²¹, Nina Jeliazkova²², Jiazhong Li¹⁰, Xuehua Li¹⁶, Huanxiang Liu¹⁰, Serena Manganelli¹³, Giuseppe F Mangiatordi⁶, Uko Maran¹⁸, Gilles Marcou²⁰, Todd Martin²³, Eugene Muratov⁸, Dac-Trung Nguyen²¹, Orazio Nicolotti⁶, Nikolai G Nikolov²⁴, Ulf Norinder¹⁵, Ester Papa¹⁴, Michel Petitjean²⁵, Geven Piir¹⁸, Pavel Pogodin²⁶, Vladimir Poroikov²⁶, Xianliang Qiao¹⁶, Ann M Richard¹, Alessandra Roncaglioni¹³, Patricia Ruiz²⁷, Chetan Rupakheti^{23

28}, Sugunadevi Sakkiah¹⁹, Alessandro Sangion¹⁴, Karl-Werner Schramm⁵, Chandrabose Selvaraj¹⁹, Imran Shah¹, Sulev Sild¹⁸, Lixia Sun²⁹, Olivier Taboureau²⁵, Yun Tang²⁹, Igor V Tetko^{30

31}, Roberto Todeschini¹², Weida Tong¹⁹, Daniela Trisciuzzi⁶, Alexander Tropsha⁸, George Van Den Driessche¹⁷, Alexandre Varnek²⁰, Zhongyu Wang¹⁶, Eva B Wedebye²⁴, Antony J Williams¹, Hongbin Xie¹⁶, Alexey V Zakharov²¹, Ziye Zheng⁹, Richard S Judson¹

Affiliations

¹ National Center for Computational Toxicology, Office of Research and Development, U.S. Environmental Protection Agency (U.S. EPA), Research Triangle Park, North Carolina, USA.
² ScitoVation LLC, Research Triangle Park, North Carolina, USA.
³ Integrated Laboratory Systems, Inc., Morrisville, North Carolina, USA.
⁴ National Toxicology Program Interagency Center for the Evaluation of Alternative Toxicological Methods (NICEATM), National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina, USA.
⁵ Technische Universität München, Wissenschaftszentrum Weihenstephan für Ernährung, Landnutzung und Umwelt, Department für Biowissenschaftliche Grundlagen, Weihenstephaner Steig 23, 85350 Freising, Germany.
⁶ Department of Pharmacy-Drug Sciences, University of Bari, Bari, Italy.
⁷ Laboratory for Molecular Modeling and Drug Design, Faculty of Pharmacy, Federal University of Goiás, Goiânia, Brazil.
⁸ Laboratory for Molecular Modeling, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA.
⁹ Chemistry Department, Umeå University, Umeå, Sweden.
¹⁰ School of Pharmacy, Lanzhou University, China.
¹¹ Information Systems & Global Solutions (IS&GS), Lockheed Martin, USA.
¹² Milano Chemometrics and QSAR Research Group, Department of Earth and Environmental Sciences, University of Milano-Bicocca, Milan, Italy.
¹³ Istituto di Ricerche Farmacologiche "Mario Negri", IRCCS, Milan, Italy.
¹⁴ QSAR Research Unit in Environmental Chemistry and Ecotoxicology, Department of Theoretical and Applied Sciences, University of Insubria, Varese, Italy.
¹⁵ Swedish Toxicology Sciences Research Center, Karolinska Institutet, Södertälje, Sweden.
¹⁶ School of Environmental Science and Technology, Dalian University of Technology, Dalian, China.
¹⁷ Department of Chemistry, Bioinformatics Research Center, North Carolina State University, Raleigh, North Carolina, USA.
¹⁸ Institute of Chemistry, University of Tartu, Tartu, Estonia.
¹⁹ Division of Bioinformatics and Biostatistics, National Center for Toxicology Research, U.S. Food and Drug Administration, Jefferson, Arkansas, USA.
²⁰ Laboratoire de Chémoinformatique-UMR7140, University of Strasbourg/CNRS, Strasbourg, France.
²¹ National Center for Advancing Translational Sciences, National Institutes of Health, Rockville, Maryland, USA.
²² IdeaConsult, Ltd., Sofia, Bulgaria.
²³ National Risk Management Research Laboratory, U.S. EPA, Cincinnati, Ohio, USA.
²⁴ Division of Risk Assessment and Nutrition, National Food Institute, Technical University of Denmark, Copenhagen, Denmark.
²⁵ Computational Modeling of Protein-Ligand Interactions (CMPLI)-INSERM UMR 8251, INSERM ERL U1133, Functional and Adaptative Biology (BFA), Universite de Paris, Paris, France.
²⁶ Institute of Biomedical Chemistry IBMC, 10 Building 8, Pogodinskaya st., Moscow 119121, Russia.
²⁷ Computational Toxicology and Methods Development Laboratory, Division of Toxicology and Human Health Sciences, Agency for Toxic Substances and Disease Registry, Centers for Disease Control and Prevention, Atlanta, Georgia, USA.
²⁸ Department of Biochemistry and Molecular Biophysics, University of Chicago, Chicago, Illinois, USA.
²⁹ Department of Pharmaceutical Sciences, School of Pharmacy, East China University of Science and Technology, Shanghai, China.
³⁰ BIGCHEM GmbH, Neuherberg, Germany.
³¹ Helmholtz Zentrum Muenchen - German Research Center for Environmental Health (GmbH), Neuherberg, Germany.

PMID: 32074470
PMCID: PMC7064318
DOI: 10.1289/EHP5580

CoMPARA: Collaborative Modeling Project for Androgen Receptor Activity

Kamel Mansouri et al. Environ Health Perspect. 2020 Feb.

. 2020 Feb;128(2):27002.

doi: 10.1289/EHP5580. Epub 2020 Feb 7.

Authors

Affiliations

¹ National Center for Computational Toxicology, Office of Research and Development, U.S. Environmental Protection Agency (U.S. EPA), Research Triangle Park, North Carolina, USA.
² ScitoVation LLC, Research Triangle Park, North Carolina, USA.
³ Integrated Laboratory Systems, Inc., Morrisville, North Carolina, USA.
⁴ National Toxicology Program Interagency Center for the Evaluation of Alternative Toxicological Methods (NICEATM), National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina, USA.
⁵ Technische Universität München, Wissenschaftszentrum Weihenstephan für Ernährung, Landnutzung und Umwelt, Department für Biowissenschaftliche Grundlagen, Weihenstephaner Steig 23, 85350 Freising, Germany.
⁶ Department of Pharmacy-Drug Sciences, University of Bari, Bari, Italy.
⁷ Laboratory for Molecular Modeling and Drug Design, Faculty of Pharmacy, Federal University of Goiás, Goiânia, Brazil.
⁸ Laboratory for Molecular Modeling, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA.
⁹ Chemistry Department, Umeå University, Umeå, Sweden.
¹⁰ School of Pharmacy, Lanzhou University, China.
¹¹ Information Systems & Global Solutions (IS&GS), Lockheed Martin, USA.
¹² Milano Chemometrics and QSAR Research Group, Department of Earth and Environmental Sciences, University of Milano-Bicocca, Milan, Italy.
¹³ Istituto di Ricerche Farmacologiche "Mario Negri", IRCCS, Milan, Italy.
¹⁴ QSAR Research Unit in Environmental Chemistry and Ecotoxicology, Department of Theoretical and Applied Sciences, University of Insubria, Varese, Italy.
¹⁵ Swedish Toxicology Sciences Research Center, Karolinska Institutet, Södertälje, Sweden.
¹⁶ School of Environmental Science and Technology, Dalian University of Technology, Dalian, China.
¹⁷ Department of Chemistry, Bioinformatics Research Center, North Carolina State University, Raleigh, North Carolina, USA.
¹⁸ Institute of Chemistry, University of Tartu, Tartu, Estonia.
¹⁹ Division of Bioinformatics and Biostatistics, National Center for Toxicology Research, U.S. Food and Drug Administration, Jefferson, Arkansas, USA.
²⁰ Laboratoire de Chémoinformatique-UMR7140, University of Strasbourg/CNRS, Strasbourg, France.
²¹ National Center for Advancing Translational Sciences, National Institutes of Health, Rockville, Maryland, USA.
²² IdeaConsult, Ltd., Sofia, Bulgaria.
²³ National Risk Management Research Laboratory, U.S. EPA, Cincinnati, Ohio, USA.
²⁴ Division of Risk Assessment and Nutrition, National Food Institute, Technical University of Denmark, Copenhagen, Denmark.
²⁵ Computational Modeling of Protein-Ligand Interactions (CMPLI)-INSERM UMR 8251, INSERM ERL U1133, Functional and Adaptative Biology (BFA), Universite de Paris, Paris, France.
²⁶ Institute of Biomedical Chemistry IBMC, 10 Building 8, Pogodinskaya st., Moscow 119121, Russia.
²⁷ Computational Toxicology and Methods Development Laboratory, Division of Toxicology and Human Health Sciences, Agency for Toxic Substances and Disease Registry, Centers for Disease Control and Prevention, Atlanta, Georgia, USA.
²⁸ Department of Biochemistry and Molecular Biophysics, University of Chicago, Chicago, Illinois, USA.
²⁹ Department of Pharmaceutical Sciences, School of Pharmacy, East China University of Science and Technology, Shanghai, China.
³⁰ BIGCHEM GmbH, Neuherberg, Germany.
³¹ Helmholtz Zentrum Muenchen - German Research Center for Environmental Health (GmbH), Neuherberg, Germany.

PMID: 32074470
PMCID: PMC7064318
DOI: 10.1289/EHP5580

Abstract

Background: Endocrine disrupting chemicals (EDCs) are xenobiotics that mimic the interaction of natural hormones and alter synthesis, transport, or metabolic pathways. The prospect of EDCs causing adverse health effects in humans and wildlife has led to the development of scientific and regulatory approaches for evaluating bioactivity. This need is being addressed using high-throughput screening (HTS) in vitro approaches and computational modeling.

Objectives: In support of the Endocrine Disruptor Screening Program, the U.S. Environmental Protection Agency (EPA) led two worldwide consortiums to virtually screen chemicals for their potential estrogenic and androgenic activities. Here, we describe the Collaborative Modeling Project for Androgen Receptor Activity (CoMPARA) efforts, which follows the steps of the Collaborative Estrogen Receptor Activity Prediction Project (CERAPP).

Methods: The CoMPARA list of screened chemicals built on CERAPP's list of 32,464 chemicals to include additional chemicals of interest, as well as simulated ToxCast™ metabolites, totaling 55,450 chemical structures. Computational toxicology scientists from 25 international groups contributed 91 predictive models for binding, agonist, and antagonist activity predictions. Models were underpinned by a common training set of 1,746 chemicals compiled from a combined data set of 11 ToxCast™/Tox21 HTS in vitro assays.

Results: The resulting models were evaluated using curated literature data extracted from different sources. To overcome the limitations of single-model approaches, CoMPARA predictions were combined into consensus models that provided averaged predictive accuracy of approximately 80% for the evaluation set.

Discussion: The strengths and limitations of the consensus predictions were discussed with example chemicals; then, the models were implemented into the free and open-source OPERA application to enable screening of new chemicals with a defined applicability domain and accuracy assessment. This implementation was used to screen the entire EPA DSSTox database of $\sim 875,000$ chemicals, and their predicted AR activities have been made available on the EPA CompTox Chemicals dashboard and National Toxicology Program's Integrated Chemical Environment. https://doi.org/10.1289/EHP5580.

PubMed Disclaimer

Figures

Figure 1 is a workflow of data sets. There is a funnel in the center with five segments. Near the mouth of the funnel are structures of chemical compounds. In vitro assay data Tox Cast super TM and A R pathway model have an arrow leading to the top-most segment of the funnel. An overlapping list of regulatory interests, including EDSP, Canadian DSL, ToxCast, EU EINECS, Tox21, ToxCast Metabolites, CPCat and ACToR, DSSTox, and EDSP has an arrow pointing toward the third segment of the funnel. Literature data Pub C Hem 11k Chemicals, 80k Experimental values has an arrow pointing toward the last segment of the funnel. A text at the bottom of the funnel reads QSAR-ready structures. The roles of the first, second, third, fourth, and fifth segments are to remove inorganics and mixtures; clean salts and counterions; normalize tautomers; remove duplicates; and final inspection, respectively. Modeling, including training set n equals 1662 goes through the process of the first segment. Modeling leads to International Consortium. Prediction, including prediction set n equals 55450 goes through the process of the third segment, and prediction also leads to International Consortium. Evaluation and consensus modeling, including evaluation set n equals 4839 goes through the process of the fifth segment. — **Figure 1.**
Workflow of the project defining the major steps and the different data sets used for training, evaluation, and prediction.

Figure 2 is a bar graph, plotting calculated scores, ranging from 0.00 to 1.00, with increments of 0.10 (y-axis) for binding, agonist, and antagonist across group acronyms ATSDR_IRFMN_1; ATSDR_IRFMN_2; ATSDR_IRFMN_3; DTU; ECUST; EPA_NCCT_1; EPA_NCCT_2; EPA_NCCT_3; EPA_NRMRL_1; EPA_NRMRL_2; FDA_HHS; IMC_1; IMC_2; IDEA; INS_LA; CMPLI; LM; IRFMN; NCATS_1; NCATS_2; NCSTATE; SWETOX_1; SWETOX_2; TARTU_1; TARTU_2; TUM; UFG; UMEA; UNC; UNIBARI; UNIMIB_1; UNIMIB_2; UNISTRA; and VCCLAB (x-axis). — **Figure 2.**
Scores of the categorical binding (black), agonist (white) and antagonist (gray) models based on the evaluation set and the scoring Equation 1.

Figure 3 plots calculated scores, ranging from 0 to 0.8, with increments of 0.1, (y-axis) for binding, agonist, and antagonist across CMPLI, LM, TUM, UNISTRA, VCCLAB (x-axis). — **Figure 3.**
Scores of the continuous binding (black), agonist (white) and antagonist models based on the evaluation set and the scoring Equation 1 (See Supplemental Material 1 for groups’ abbreviations).

Figure 4 is a histogram, plotting number of predicted chemical structures, ranging from 0 to 3, in increments of 0.5 (y-axis) for binding, agonist, and antagonist across number of models, ranging from 10 to 35, in increments of 5 (x-axis). Across the x-axis, the top-left portion of the graph mentions times 10 super 4. — **Figure 4.**
Histogram showing the distribution of the number of binding (black), agonist (white) and antagonist (gray) models covering the prediction set (minimum of 11 models for agonist and antagonist and 20 for binding).

Figure 5 is a histogram, plotting number of predicted chemical structures, ranging from 0 to 3.5, in increments of 0.5 (y-axis) for binding, agonist, and antagonist across concordance, ranging from 0.5 to 1, in increments of 0.5 (x-axis). Across the x-axis, the top-left portion of the graph mentions times 10 super 4. — **Figure 5.**
Histogram showing the distribution of the concordance of the binding (black), agonist (white) and antagonist (gray) single models.

Figure 6 is a histogram, plotting number of predicted chemical structures, ranging from 0 to 3 (y-axis) across prediction concordance for actives, ranging from 0 to 1 in increments of 0.1 (x-axis). Across the x-axis, the top-left portion of the graph mentions times 10 super 4. — **Figure 6.**
Histogram showing the distribution of the concordance between the binding models over the active predictions.

Figure 7 is a histogram, plotting chemicals, ranging from 0 to 55000, in increments of 5000 (left y-axis) and score, ranging from 0 to 1, in increments of 0.1 (right y-axis) for coverage and group across group acronyms ATSDR_IRFMN_1; ATSDR_IRFMN_3; ECUST; EPA_NCCT_2; EPA_NRMRL_1; FDA_HHS; IBMC_2; INS_LA; LM; NCATS_1; NCSTATE; SWETOX_2; TARTU_2; UFG; UNC; UNIMIB_1; and UNISTRA (x-axis). — **Figure 7.**
Histogram showing the coverage and S-score of the single binding models in comparison with the consensus binding predictions for the full CoMPARA set.

Figure 8 is a box plot, plotting concordance in prediction, ranging from 0.5 to 1, in increments of 0.5 (y-axis) across accuracy in classification prediction, including accurate and inaccurate (x-axis). — **Figure 8.**
Box plot showing the correlation between concordance and accuracy of prediction for the evaluation set chemicals. The box represents the interquartile range. The lower and upper box boundaries represent the 25th and 75th percentiles, respectively. The horizontal line splitting the box represents the median value. The upper and lower whiskers represent the minimum and maximum values, respectively. Outliers are represented by the $+$ symbol.

Figure 9 is a box plot, plotting concordance in prediction, ranging from 0.5 to 1, in increments of 0.5 (y-axis) across potency of active binders, including very weak, weak, moderate, and strong (x-axis). — **Figure 9.**
Box plot showing the correlation between concordance and potency for the active binders of the evaluation set chemicals. The box represents the interquartile range. The lower and upper box boundaries represent the 25th and 75th percentiles, respectively. The horizontal line splitting the box represents the median value. The upper and lower whiskers represent the minimum and maximum values, respectively. Outliers are represented by the $+$ symbol.

Figure 10 is a graph, plotting balanced accuracy in 5-fold CV, ranging from 0.5 to 0.95, in increments of 0.5 (y-axis) across ranked descriptors for A R activity, ranging from 0 to 70, in increments of 10 (x-axis) for binding (x: 23; y: 0.9399), antagonist (x: 15; y: 0.9434), and agonist (x: 10; y: 0.9579). — **Figure 10.**
Selected descriptors for the binding (. symbol), agonist (* symbol), and antagonist (x symbol) models and corresponding balanced accuracy (BA) calculated in five-fold cross-validation in forward selection based on the genetic algorithm (GA) ranking. The ranked descriptors are not overlapping for the three modalities.

See this image and copyright information in PMC

References

1. Ball N, Cronin MTD, Shen J, Blackburn K, Booth ED, Bouhifd M, et al. 2016. Toward Good Read-Across Practice (GRAP) guidance. ALTEX 33(2):149–166, PMID: 26863606, 10.14573/altex.1601251. - DOI - PMC - PubMed
1. Ballabio D, Grisoni F, Todeschini R. 2018. Multivariate comparison of classification performance measures. Chemometr Intell Lab Syst 174:33–44, 10.1016/j.chemolab.2017.12.004. - DOI
1. Ballabio D, Vasighi M, Consonni V, Kompany-Zareh M. 2011. Genetic algorithms for architecture optimisation of counter-propagation artificial neural networks. Chemometr Intell Lab Syst 105(1):56–64, 10.1016/j.chemolab.2010.10.010. - DOI
1. Benigni R. 2003. Quantitative Structure-Activity Relationship (QSAR) Models of Mutagens and Carcinogens. Boca Raton, FL: CRC Press. - PubMed
1. Berk RA. 2008. Statistical Learning from a Regression Perspective. New York, NY: Springer-Verlag.

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions
Actions

Grants and funding

HHSN273201500010C/ES/NIEHS NIH HHS/United States

LinkOut - more resources

Full Text Sources
- Europe PubMed Central
- PubMed Central
Research Materials
- NCI CPTC Antibody Characterization Program

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

CoMPARA: Collaborative Modeling Project for Androgen Receptor Activity

Affiliations

CoMPARA: Collaborative Modeling Project for Androgen Receptor Activity

Authors

Affiliations

Abstract

Figures

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Research Materials