Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Aug 27;58(8):1483-1500.
doi: 10.1021/acs.jcim.8b00104. Epub 2018 Jul 23.

Modeling Small-Molecule Reactivity Identifies Promiscuous Bioactive Compounds

Affiliations

Modeling Small-Molecule Reactivity Identifies Promiscuous Bioactive Compounds

Matthew K Matlock et al. J Chem Inf Model. .

Abstract

Scientists rely on high-throughput screening tools to identify promising small-molecule compounds for the development of biochemical probes and drugs. This study focuses on the identification of promiscuous bioactive compounds, which are compounds that appear active in many high-throughput screening experiments against diverse targets but are often false-positives which may not be easily developed into successful probes. These compounds can exhibit bioactivity due to nonspecific, intractable mechanisms of action and/or by interference with specific assay technology readouts. Such "frequent hitters" are now commonly identified using substructure filters, including pan assay interference compounds (PAINS). Herein, we show that mechanistic modeling of small-molecule reactivity using deep learning can improve upon PAINS filters when modeling promiscuous bioactivity in PubChem assays. Without training on high-throughput screening data, a deep learning model of small-molecule reactivity achieves a sensitivity and specificity of 18.5% and 95.5%, respectively, in identifying promiscuous bioactive compounds. This performance is similar to PAINS filters, which achieve a sensitivity of 20.3% at the same specificity. Importantly, such reactivity modeling is complementary to PAINS filters. When PAINS filters and reactivity models are combined, the resulting model outperforms either method alone, achieving a sensitivity of 24% at the same specificity. However, as a probabilistic model, the sensitivity and specificity of the deep learning model can be tuned by adjusting the threshold. Moreover, for a subset of PAINS filters, this reactivity model can help discriminate between promiscuous and nonpromiscuous bioactive compounds even among compounds matching those filters. Critically, the reactivity model provides mechanistic hypotheses for assay interference by predicting the precise atoms involved in compound reactivity. Overall, our analysis suggests that deep learning approaches to modeling promiscuous compound bioactivity may provide a complementary approach to current methods for identifying promiscuous compounds.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing financial interest.

Figures

Figure 1.
Figure 1.
Assay data from PubChem reveals a large number of potential bioassay promiscuous compounds. (A) Analysis of all non-ChEMBL PubChem assays testing greater than 1000 compounds. This study restricted analysis to compounds tested in greater than 100 bioassays (384 328 compounds, from an initial 1 226 075). (B) Compounds were defined as promiscuous bioactives if they were active above a fixed percentage of tested bioassays. Cutoffs of 5, 10, 15, and 20% were initially considered. Promiscuous activity of compounds follows an approximate power distribution (Figure S1). At a promiscuity cutoff of 5%, approximately 3.40% of compounds in the data set were considered promiscuous. Note, many compounds were active in more than 20% of tested assays.
Figure 2.
Figure 2.
Schematic of deep convolutional neural network models for predicting small-molecule reactivity and bioassay promiscuity. (A) Atoms in a test compound are represented as rows of numerical descriptors in a data matrix. These data are input to a neural network with one hidden layer of ten units. This neural network calculates four atom reactivity scores, each score predicts nucleophilic attack at that atom by GSH, cyanide, DNA, or protein. The top five atom reactivity scores in each category are then combined with molecule descriptors and are then used to calculate four molecule reactivity scores. Each molecule level reactivity score is then trained to predict conjugation of the input molecule to either GSH, cyanide, DNA, or protein., (B) Molecule-level reactivity scores are further combined with another neural network to produce a single integrated reactive promiscuity score. This network can then be trained to predict promiscuous bioactivity in HTS data sets. (C) A hybrid model combines molecule-level reactivity scores with binary indicators for PAINS substructure filter matches. A single hidden layer neural network is then trained to predict promiscuous behavior in HTS data sets.
Figure 3.
Figure 3.
Substructure filter-based methods for flagging promiscuous and/or assay interference compounds detecting promiscuous bioactives in PubChem. (A) At the 5% promiscuity cutoff, PAINS filters (red) are enriched 3.85-fold for promiscuous bioactives (p < 10−10, χ2 test) and Lilly MedChem filters (yellow) are enriched 1.85-fold for promiscuous bioactives (p < 10−10, χ2 test). (B) PAINS filters have a lower sensitivity than the Lilly MedChem filters for promiscuous bioactive compounds in PubChem. (C) However, PAINS filters have a 95% specificity for promiscuous actives, while Lilly MedChem filters have 67.5% specificity. ***: p < 0.0001.
Figure 4.
Figure 4.
GSH reactivity predictions for compounds in DrugBank and PubChem. PAINS filters are associated with increased GSH reactivity scores in PubChem, but reactivity is not increased among FDA-approved drugs. (A) Reactivity scores for FDA-approved drugs in DrugBank are comparable between PAINS and non-PAINS, whereas reactivity scores of PubChem PAINS matches are substantially elevated compared to non-PAINS and compared to DrugBank (p = 2.06 × 10−7, Mann–Whitney U-test). While some FDA-approved drugs act via a reactive mechanism, the majority of FDA-approved drugs are not explicitly reactive and not found to be promiscuous bioactives. Outliers are not shown. (B) Compounds active in more than 5% of tested assays in PubChem have substantially higher reactivity scores than nonpromiscuous compounds (p < 10−10, Mann–Whitney U-test). Outliers are not shown. **: p < 0.001. ***: p < 0.0001.
Figure 5.
Figure 5.
Reactivity scores are predictive of promiscuous bioactivity. (A) Model scores of small-molecule reactivity with DNA (AUC 63.8%), GSH (AUC 62.1%), and protein (AUC 62.1%) are all modestly predictive of promiscuous behavior at the 5% bioactivity cutoff. Predictions of GSH reactivity achieve similar sensitivity and specificity to PAINS filters, while Lilly MedChem filters have a higher sensitivity for promiscuous actives but lower specificity. Combining the four reactivity scores into a single integrated score via a small neural network achieves a 100-fold cross validated AUC of 69.1%. Including PAINS filter matches with reactivity scores in a similar manner achieves a 100-fold cross validated AUC of 69.5%. (B) CROC curves with the exponential transform (α = 10) show a substantial increase in early recall for the combined PAINS and reactivity model, with a 4% increase in sensitivity compared to PAINS filters at the same specificity.
Figure 6.
Figure 6.
Reactivity scores flag subclasses of chemotypes with predicted enriched biological reactivity. The cyano_pyridone PAINS filter group consists of a core pyridone ring with one cyano substituent. (A) Note six of the 11 promiscuous compounds matched by this filter group are predicted to be reactive at an sp2 carbon within a Michael-acceptor-like motif located meta to the cyano group (purple). Reactivity modeling predicts that GSH attacks this electron-deficient region. The percentage of biological assays in which each compound was active are noted. (B) The other five promiscuous compounds within this filter group correspond to variants of the cyano_pyridone filter group not containing a traditional Michael-acceptor (pink), which are not predicted to be strongly reactive with GSH. (C) These predicted less-reactive compounds are active in a smaller percentage of biological assays (p = 0.0001, Mann–Whitney U test). **: p < 0.001.
Figure 7.
Figure 7.
Many PAINS filters are associated with a nearby or overlapping reactive Michael-acceptor motif. (A) The imineone filter group matches a chemical motif with adjacent imine and ketone groups, as well as diones. Among 457 compounds matching this filter (red), 185 compounds (40%) possess a Michael-acceptor motif that overlaps with the motif matched by this filter group (purple). Michael-acceptor motifs are well-known electrophiles assigned high reactivity scores by our model. (B) The thiophene_amino filters match various substituted thiophene rings. Among 224 compounds matching this filter group, 28 (13%) contain a Michael-acceptor motif adjacent to the amide and outside the motif matched by the filter group (purple). (C, D) Compounds with this Michael-acceptor are enriched 3.99-fold for promiscuous actives among compounds matching the imineone filter (p < 10−10, χ2 test), while compounds matching the thiophene amino filter group and the Michael-acceptor motif are enriched 3.30-fold for promiscuous actives (p = 7.97 × 10−3, χ2 test). Compounds with Michael-acceptor motifs not matching the imineone or thiophene amino filters are not strongly enriched for promiscuous bioactivity. **: p < 0.001, ***: p < 0.0001.
Figure 8.
Figure 8.
Some PAINS filters may be associated with other, unrelated reactive motifs. (A) The imine_one_fives filter group matches a five-membered ring motif containing both imine and ketone groups. Among 102 compounds matching this filter group (red), 28 (27%) contain a thioamide group conjugated to the ring motif (purple). (B) Compounds matching the filter group and this thioamide group are enriched 3.31-fold for promiscuous actives compared to compounds matching only the filter group (p = 7.97 × 10−3, χ2 test). (C) Compounds containing the thioamide group are assigned higher reactivity scores than compounds matching only the filter group (p < 10−10, Mann–Whitney U-test). (D) Oxidation of the thioamide group is known to form a reactive intermediate that can conjugate to proteins in rat hepatocytes. **: p < 0.001. ***: p < 0.0001.
Figure 9.
Figure 9.
Reactivity scores suggest mechanisms of promiscuity for PAINS filters without a known mechanism. (A) The pyrrole filter group matches compounds containing the five-membered nitrogen aromatic ring pyrrole. Many promiscuous bioactive compounds contain a reactive double bond motif (purple). Compounds 1 and 3 also match the ene_rhod PAINS filter group, but compound 2 does not match another PAINS filter. Compound 2 is a hydrazone, which may tautomerize to form a reactive azo compound, (B) Predicted atom-level GSH reactivity at this double bond was used to construct a ROC curve. Compounds matching the filter group but not containing an adjacent double bond motif received a score of 0. Molecule-level protein reactivity predicts pyrrole promiscuity (AUC = 65.3%). GSH reactivity scoring of the double bond achieves an AUC of 60.6%, which accounts for the majority of the predictive power of this model (difference not statistically significant, p = 0.19, ROC Z-test). The dashed line denotes the expected ROC of a random model.
Figure 10.
Figure 10.
Reactivity scores identify non-obvious, reactive motifs not captured by PAINS filters. (A) The het_thio_666 PAINS filter group consists of tricyclic, heteroaromatic, sulfur-containing compounds. Twenty-four of 44 bioassay promiscuous compounds matching this filter group in PubChem also contain tertiary amine rings such as piperidines, piperazines, or pyrrolidines. (B) Among compounds matching this PAINS filter group, site-level cyanide reactivity scores are nonzero only on the atoms not matched by the filter group, suggesting a reactive mechanism unrelated to the motifs in this filter group. (C) Cyclic tertiary amines are known to be oxidized in vivo by Cytochrome P450 enzymes., This oxidation leads to the formation of an iminium ion intermediate that can react with cyanide or biological substrates. ***: p < 0.0001.
Figure 11.
Figure 11.
A reactivity analysis of literature reported HAT inhibitors identifies likely sites of nonspecific reactivity. (A) The combined reactive promiscuity model predicts the results of their GSH adduct formation counterscreen with the same sensitivity (75%) as PAINS filters, but with enhanced specificity (100% versus 63.6%, respectively). (B) The reactivity model also predicts the results of their CoA adduct formation counterscreen with the same sensitivity (66.7%) as PAINS filters, but with enhanced specificity (100% versus 70%, respectively). (C) Example reported HAT inhibitors and respective GSH reactivity predictions. From left to right: C646 contains an ene_rhod PAINS filter group match, a common reactive motif. Our GSH reactivity score predicts that the mechanism of nonspecific thiol reactivity involves nucleophilic attack at the β carbon of a Michael-acceptor contained within the motif matched by this filter group. The catechol groups of gossypol match the catechol PAINS filter group, though our model predicts the aldehyde substituents as thiol-reactive. We note gossypol can undergo redox-activity and form quinones under certain conditions, and it is confirmed which adduct(s) are formed under any given assay condition. While MB-3 is not flagged by PAINS filters, it contains a reactive terminal olefin group. Dashed line denotes the expected ROC of a random model.

References

    1. Arrowsmith CH; Audia JE; Austin C; Baell J; Bennett J; Blagg J; Bountra C; Brennan PE; Brown PJ; Bunnage ME; Buser-Doepner C; Campbell RM; Carter AJ; Cohen P; Copeland RA; Cravatt B; Dahlin JL; Dhanak D; Edwards AM; Frederiksen M; Frye SV; Gray N; Grimshaw CE; Hepworth D; Howe T; Huber KVM; Jin J; Knapp S; Kotz JD; Kruger RG; Lowe D; Mader MM; Marsden B; Mueller-Fahrnow A; Müller S; O’Hagan RC; Overington JP; Owen DR; Rosenberg SH; Ross R; Roth B; Schapira M; Schreiber SL; Shoichet B; Sundström M; Superti-Furga G; Taunton J; Toledo-Sherman L; Walpole C; Walters MA; Willson TM; Workman P; Young RN; Zuercher WJ The promise and peril of chemical probes. Nat. Chem. Biol 2015, 11, 536–541. - PMC - PubMed
    1. Inglese J; Johnson RL; Simeonov A; Xia M; Zheng W; Austin CP; Auld DS High-throughput screening assays for the identification of chemical probes. Nat. Chem. Biol 2007, 3, 466–479. - PubMed
    1. Shoichet BK Screening in a spirit haunted world. Drug Discovery Today 2006, 11, 607–615. - PMC - PubMed
    1. Thorne N; Auld DS; Inglese J Apparent activity in high-throughput screening: origins of compound-dependent assay interference. Curr. Opin. Chem. Biol 2010, 14, 315–324. - PMC - PubMed
    1. Xie Y; Dahlin JL; Oakley AJ; Casarotto MG; Board PG; Baell JB Reviewing hit discovery literature for difficult targets: glutathione transferase omega-1 as an example. J. Med. Chem 2018, DOI: 10.1021/acs.jmedchem.8b00318. - DOI - PubMed

Publication types

MeSH terms

LinkOut - more resources