Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Apr 23;22(4):675.
doi: 10.3390/molecules22040675.

High-Performance Prediction of Human Estrogen Receptor Agonists Based on Chemical Structures

Affiliations

High-Performance Prediction of Human Estrogen Receptor Agonists Based on Chemical Structures

Yuki Asako et al. Molecules. .

Abstract

Many agonists for the estrogen receptor are known to disrupt endocrine functioning. We have developed a computational model that predicts agonists for the estrogen receptor ligand-binding domain in an assay system. Our model was entered into the Tox21 Data Challenge 2014, a computational toxicology competition organized by the National Center for Advancing Translational Sciences. This competition aims to find high-performance predictive models for various adverse-outcome pathways, including the estrogen receptor. Our predictive model, which is based on the random forest method, delivered the best performance in its competition category. In the current study, the predictive performance of the random forest models was improved by strictly adjusting the hyperparameters to avoid overfitting. The random forest models were optimized from 4000 descriptors simultaneously applied to 10,000 activity assay results for the estrogen receptor ligand-binding domain, which have been measured and compiled by Tox21. Owing to the correlation between our model's and the challenge's results, we consider that our model currently possesses the highest predictive power on agonist activity of the estrogen receptor ligand-binding domain. Furthermore, analysis of the optimized model revealed some important features of the agonists, such as the number of hydroxyl groups in the molecules.

Keywords: QSAR prediction model; Tox21 data challenge 2014; estrogen receptor; machine learning; random forest.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
Scheme of the model construction.
Figure 2
Figure 2
Charged and uncharged forms 100 random forest (RF) models were constructed for the charged, uncharged, and both forms of each descriptor. All models were involved in predicting the activities of the estrogen receptor ligand-binding domain for the compounds in the final evaluation set. 100 ROC_AUC values were plotted for each group. Green lines denote the averages and their 95% confidence intervals.
Figure 3
Figure 3
Number of descriptors 100 RF models were constructed for both numbers of descriptors. All models were involved in predicting the activities of estrogen receptor ligand-binding domain for compounds in the final evaluation set. 100 ROC_AUC values were plotted for each group. Green lines denote the averages and their 95% confidence intervals.
Figure 4
Figure 4
Relationship between ROC_AUC values in models constructed from the test set (50%) and the final evaluation set. Each point denotes the performance of the model. This figure is referred from [9].
Figure 5
Figure 5
Effects of the hyperparameter Number of Terms on the RF modeling 190 RF models were constructed in each group, and all models were then involved in predicting the activities of the estrogen receptor ligand-binding domain for compounds in the final evaluation set. Plotted are the ROC_AUC values for the final evaluation set in each group. Green lines denote the averages and their 95% confidence intervals.
Figure 6
Figure 6
Effects of the hyperparameter Maximum Splits per Tree on the RF modeling ROC_AUC values of the training set (50%) and final evaluation set are plotted in closed and open circles, respectively. Large Maximum Splits per Tree introduced model overfitting. The predictive ability was optimized for Maximum Splits per Tree = 6.
Figure 7
Figure 7
ROC curves for predicting ER-LBD-activating compounds with the newly proposed model (left) and the best model of the Tox21 Data Challenge 2014 ROC-AUCs and hyperparameter values in the models are also described.

Similar articles

Cited by

References

    1. Katzenellenbogen B.S., Montano M.M., Ediger T.R., Sun J., Ekena K., Lazennec G., Martini P.G., McInerney E.M., Delage-Mourroux R., Weis K., et al. Estrogen receptors: selective ligands, partners, and distinctive pharmacology. Recent Prog. Horm. Res. 2000;55:163–193. - PubMed
    1. Setchell K.D. Soy isoflavones—Benefits and risks from nature's selective estrogen receptor modulators (SERMs) J. Am. Coll. Nutr. 2001;20:354S–362S. doi: 10.1080/07315724.2001.10719168. - DOI - PubMed
    1. Zhang Y., Dong S., Wang H., Tao S., Kiyama R. Biological Impact of Environmental Polycyclic Aromatic Hydrocarbons (ePAHs) as Endocrine Disruptors. Environ. Pollut. 2016;213:809–824. doi: 10.1016/j.envpol.2016.03.050. - DOI - PubMed
    1. Hsieh J.H., Sedykh A., Huang R., Xia M., Tice R.R. A Data Analysis Pipeline Accounting for Artifactsin Tox21 Quantitative High-Throughput Screening Assays. J. Biomol. Screen. 2015;20:887–897. doi: 10.1177/1087057115581317. - DOI - PMC - PubMed
    1. United Environmental Protection Agency Toxicology Testing in the 21st Century (Tox21) [(accessed on 16 April 2017)]; Available online: http://www.epa.gov/chemical-research/toxicology-testing-21st-century-tox21.

LinkOut - more resources