Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Sep 25;57(9):2294-2308.
doi: 10.1021/acs.jcim.7b00222. Epub 2017 Aug 23.

Comprehensive and Automated Linear Interaction Energy Based Binding-Affinity Prediction for Multifarious Cytochrome P450 Aromatase Inhibitors

Affiliations

Comprehensive and Automated Linear Interaction Energy Based Binding-Affinity Prediction for Multifarious Cytochrome P450 Aromatase Inhibitors

Marc van Dijk et al. J Chem Inf Model. .

Abstract

Cytochrome P450 aromatase (CYP19A1) plays a key role in the development of estrogen dependent breast cancer, and aromatase inhibitors have been at the front line of treatment for the past three decades. The development of potent, selective and safer inhibitors is ongoing with in silico screening methods playing a more prominent role in the search for promising lead compounds in bioactivity-relevant chemical space. Here we present a set of comprehensive binding affinity prediction models for CYP19A1 using our automated Linear Interaction Energy (LIE) based workflow on a set of 132 putative and structurally diverse aromatase inhibitors obtained from a typical industrial screening study. We extended the workflow with machine learning methods to automatically cluster training and test compounds in order to maximize the number of explained compounds in one or more predictive LIE models. The method uses protein-ligand interaction profiles obtained from Molecular Dynamics (MD) trajectories to help model search and define the applicability domain of the resolved models. Our method was successful in accounting for 86% of the data set in 3 robust models that show high correlation between calculated and observed values for ligand-binding free energies (RMSE < 2.5 kJ mol-1), with good cross-validation statistics.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing financial interest.

Figures

Figure 1
Figure 1
Members of four generations of clinical steroidal (c,f) and nonsteroidal (a,b,d,e) aromatase (CYP19A1) inhibitors. First generation: aminoglutethimide (a, Cytadren, Novartis). Second generation: Fadrozole (b, Afema, Novartis), and Formestane (c, Lentaron, Novartis). Third generation: Anastrozol (d, Arimidex, AstraZeneca), Letrozole (e, Femara, Novartis), and Exemestane (f, Aromasine, Pfizer).
Figure 2
Figure 2
Schematic overview of the automated machine learning workflow aimed at finding the a posteriori estimates for one or multiple combinations of α and β parameters of the iLIE equation for a set of compounds, which maximize the number of explained compounds in one or more LIE models with predefined RMSE and r2 cutoffs. The “data set curation and filtering” stage (A) uses FFT based MD trajectory filtering to obtain stable average values of ΔVvdW and ΔVel. Ligand poses with average ΔVvdWVel pairs outside a 97.5% confidence interval (Figure S2) identified using multivariate normal mixture model analysis are labeled as outlier (out). Protein–ligand interaction profiling is performed on the FFT based stable energy trajectories (dashed lines). Ligand groups identified by the clustering of the interaction profiles are used as input to the stochastic search (B). The existence of LIE models for these clusters is explored during an iterative four-step stochastic search in which compounds are added to the evolving model from the global compound pool, according to a progressively updated probability using the iRLS weights of the added compounds at every iteration. The charted model landscape is clustered during the last workflow stage (C) and final models are selected.
Figure 3
Figure 3
Propagation of weights Wi for four poses of a compound as a function of α or β model parameter, as obtained by solving the iLIE equation on a fixed grid of α and β parameters and focusing on the grid region where the difference between ΔGpred and experimentally observed ΔGobs is smaller than 5 kJ mol–1.
Figure 4
Figure 4
LIE α and β model parameter scan performed on each of the 132 ligands individually by solving the iLIE equation (eq 3), for every point on a square grid of α and β model parameters between a value of 0 and 1 with a grid spacing of 0.01. The color gradient highlights the percentage of ligands having a ΔGpred value within 5 kJ mol–1 of ΔGobs for a given combination of α and β.
Figure 5
Figure 5
Results of the stochastic sampling of α and β model-parameter space for the data set of 132 CYP19A1 inhibitors. The likelihood for a compound (labeled by ID on the x-axis) to be part of a model in a given β (panel A) and α (panel B) range is shown as a heat map. The color gradient is a dimensionless measure of the likelihood calculated as the summed iRLS regression weights of a ligand in each of the sampled models of the stochastic search divided by the Root-Mean-Square Error (RMSE) of the model. The measure of likelihood is plotted as a function of the model α and β parameters with a bin size of 0.02. Regions of highest density used to train the final models are labeled as models 1 to 4 (red dashed rectangular boxes).
Figure 6
Figure 6
Correlation between the predicted (ΔGpred) and observed (ΔGobs) binding free energies in three models (panels A to C for models 1 to 3, respectively) that were trained using the results from stochastic sampling. The solid diagonal lines indicate ideal correlation and the dashed lines indicate upper and lower error margins of 5 kJ mol–1. Blue filled circles correspond to compounds belonging to the cluster centers as indicated by the red boxes 1–3 in Figure 5, and red filled circles correspond to remaining compounds in that cluster, and gray filled circles correspond to the remaining compounds of the data set as predicted by the model. Blue and red filled crosses in panel C indicate low-affinity Fadrozole-like compounds.
Figure 7
Figure 7
Cartoon representation of Cytochrome P450 19A1 (PDB code 3EQM(55)) with the natural substrate 4-androstene-3-17-dione (ASD, cyan stick representation) bound. Protein residues (stick representation) involved in polar protein–ligand interactions in more than 50% of the simulation time are grouped in four hotspots (indicated in red, yellow, green and purple), including the heme group (HEME).
Figure 8
Figure 8
Protein–ligand interaction profiles for the compounds in models 1–3 (Table 1) derived from the stochastic approximate inference. The relative interaction frequencies for each protein residue–ligand interaction are represented by vertically stacked bars where the bar colors correspond to a specific interaction type as listed in the graph legend. Hydrophobic contact frequencies are divided by 10 because of their relative abundance with respect to the other classified interactions.

Similar articles

Cited by

References

    1. Brodie A. M.; Njar V. C. Aromatase inhibitors and their application in breast cancer treatment. Steroids 2000, 65, 171–179. 10.1016/S0039-128X(99)00104-X. - DOI - PubMed
    1. Eisen A.; Trudeau M.; Shelley W.; Messersmith H.; Pritchard K. I. Aromatase inhibitors in adjuvant therapy for hormone receptor positive breast cancer: a systematic review. Cancer Treat. Rev. 2008, 34, 157–174. 10.1016/j.ctrv.2007.11.001. - DOI - PubMed
    1. Brodie A. Aromatase inhibitors in breast cancer. Trends Endocrinol. Metab. 2002, 13, 61–65. 10.1016/S1043-2760(01)00529-X. - DOI - PubMed
    1. Kellis J. T.; Vickery L. E. Purification and characterization of human placental aromatase cytochrome P-450. J. Biol. Chem. 1987, 262, 4413–4420. - PubMed
    1. Miller W. R.; Mullen P.; Sourdaine P.; Watson C.; Dixon J. M.; Telford J. Regulation of aromatase activity within the breast. J. Steroid Biochem. Mol. Biol. 1997, 61, 193–202. 10.1016/S0960-0760(97)80012-X. - DOI - PubMed

Publication types

LinkOut - more resources