Comparing Multiple Machine Learning Algorithms and Metrics for Estrogen Receptor Binding Prediction
- PMID: 30114914
- PMCID: PMC6181119
- DOI: 10.1021/acs.molpharmaceut.8b00546
Comparing Multiple Machine Learning Algorithms and Metrics for Estrogen Receptor Binding Prediction
Abstract
Many chemicals that disrupt endocrine function have been linked to a variety of adverse biological outcomes. However, screening for endocrine disruption using in vitro or in vivo approaches is costly and time-consuming. Computational methods, e.g., quantitative structure-activity relationship models, have become more reliable due to bigger training sets, increased computing power, and advanced machine learning algorithms, such as multilayered artificial neural networks. Machine learning models can be used to predict compounds for endocrine disrupting capabilities, such as binding to the estrogen receptor (ER), and allow for prioritization and further testing. In this work, an exhaustive comparison of multiple machine learning algorithms, chemical spaces, and evaluation metrics for ER binding was performed on public data sets curated using in-house cheminformatics software (Assay Central). Chemical features utilized in modeling consisted of binary fingerprints (ECFP6, FCFP6, ToxPrint, or MACCS keys) and continuous molecular descriptors from RDKit. Each feature set was subjected to classic machine learning algorithms (Bernoulli Naive Bayes, AdaBoost Decision Tree, Random Forest, Support Vector Machine) and Deep Neural Networks (DNN). Models were evaluated using a variety of metrics: recall, precision, F1-score, accuracy, area under the receiver operating characteristic curve, Cohen's Kappa, and Matthews correlation coefficient. For predicting compounds within the training set, DNN has an accuracy higher than that of other methods; however, in 5-fold cross validation and external test set predictions, DNN and most classic machine learning models perform similarly regardless of the data set or molecular descriptors used. We have also used the rank normalized scores as a performance-criteria for each machine learning method, and Random Forest performed best on the validation set when ranked by metric or by data sets. These results suggest classic machine learning algorithms may be sufficient to develop high quality predictive models of ER activity.
Keywords: Bayesian; deep learning; estrogen receptor; machine learning; support vector machine.
Conflict of interest statement
Competing interests:
S.E. is owner, D.P.R. and K.M.Z., are employees and A.M.C is a consultant of Collaborations Pharmaceuticals Inc.
Figures
References
-
- Hall JM; Couse JF; Korach KS The multifaceted mechanisms of estradiol and estrogen receptor signaling. J Biol Chem 2001, 276, (40), 36869–72. - PubMed
-
- Giguere V; Yang N; Segui P; Evans RM Identification of a new class of steroid hormone receptors. Nature 1988, 331, (6151), 91–4. - PubMed
-
- Soltysik K; Czekaj P Membrane estrogen receptors - is it an alternative way of estrogen action? J Physiol Pharmacol 2013, 64, (2), 129–42. - PubMed
-
- Journe F; Body JJ; Leclercq G; Laurent G Hormone therapy for breast cancer, with an emphasis on the pure antiestrogen fulvestrant: mode of action, antitumor efficacy and effects on bone health. Expert Opin Drug Saf 2008, 7, (3), 241–58. - PubMed
Publication types
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
