Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Aug 30;10(35):eadj3010.
doi: 10.1126/sciadv.adj3010. Epub 2024 Aug 30.

Accelerating drug discovery and repurposing by combining transcriptional signature connectivity with docking

Affiliations

Accelerating drug discovery and repurposing by combining transcriptional signature connectivity with docking

Alexander W Thorman et al. Sci Adv. .

Abstract

We present an in silico approach for drug discovery, dubbed connectivity enhanced structure activity relationship (ceSAR). Building on the landmark LINCS library of transcriptional signatures of drug-like molecules and gene knockdowns, ceSAR combines cheminformatic techniques with signature concordance analysis to connect small molecules and their targets and further assess their biophysical compatibility using molecular docking. Candidate compounds are first ranked in a target structure-independent manner, using chemical similarity to LINCS analogs that exhibit transcriptomic concordance with a target gene knockdown. Top candidates are subsequently rescored using docking simulations and machine learning-based consensus of the two approaches. Using extensive benchmarking, we show that ceSAR greatly reduces false-positive rates, while cutting run times by multiple orders of magnitude and further democratizing drug discovery pipelines. We further demonstrate the utility of ceSAR by identifying and experimentally validating inhibitors of BCL2A1, an important antiapoptotic target in melanoma and preterm birth-associated inflammation.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.. The overall principle of the ceSAR approach.
Candidate molecules are first ranked by their chemical similarity to concordant LINCS analogs, i.e., drug-like molecules with transcriptional signatures concordant to a signature of the target gene KD (red box), and subsequently reranked by docking simulations to assess their biophysical complementarity with the target protein (blue box). By combining signature concordance and biophysical complementarity, the library of candidate compounds is reduced to a small subset enriched for true positives (TP) for further validation (yellow box). Here, a fictitious SRC KD signature consists of six genes, with blue indicating down-regulated and yellow indicating up-regulated genes. Signatures of three compounds targeting the EGFR-SRC-JUN cascade are concordant with that of SRC KD, but only the actual SRC inhibitor (green circle) is found to fit the binding pocket by docking.
Fig. 2.
Fig. 2.. The hierarchy of ceSAR methods.
Dependencies and notation for the hierarchy of ceSAR methods introduced and benchmarked in this work (A) and the overall ceSAR-S workflow (B). Note that the methods highlighted in red in (A) do not depend on protein structure. ceSAR-S (and ceSAR-S*) workflow shown in (B) is implemented in sig2lead.net and stand-alone sig2lead application that combine signature connectivity analysis for LINCS compounds with chemical similarity analysis for user defined compounds. The latter is scored on the basis of their chemical similarity to concordant LINCS analogs. Note that sig2lead allows for user defined loss-of-function transcriptional signatures when a LINCS KD signature is not available.
Fig. 3.
Fig. 3.. Average speedup (in logarithmic scale) on 20 DUD-E targets for methods relative to AutoDock.
ceSAR The consensus approaches ceSAR-C1 (yellow) and ceSAR-cML1 (green) reduce the run time by 100× compared with docking. Structure-independent ceSAR-S reduces the run time by ~560× when using the fpSim function that represents current methods (see Materials and Methods) to compute the chemical similarity (ceSAR-S:fpSim, dark red) and by ~48,000× when using the ultrafast minSim (ceSAR-S:mSim, red) algorithm introduced in this work.
Fig. 4.
Fig. 4.. Top true-positive ranks for 20 DUD-E targets.
Results for AutoDock (blue), ceSAR-S (red), ceSAR-C1 (yellow), and ceSAR-cML1 (green) consensus approaches. Note complementarity of signature connectivity and docking approaches, with ceSAR working well when docking fails for the last five targets (SRC, MK14, DHFR, PPARG, and HSP90), while docking working well when ceSAR fails (Thrombin and PNP). Note also that the consensus methods are more robust and outperform both AutoDock and ceSAR-S in terms of the number of targets with a true positive as the top-ranking candidate or within the top 10 candidates. CDK2, cyclin-dependent kinase 2; COX2, cyclooxygenase 2.
Fig. 5.
Fig. 5.. Precision curves for 20 DUD-E targets.
The median (A) and individual (B) precision curves for 20 DUD-E targets as a function of the library size. AutoDock is compared with ceSAR-S and consensus approaches, ceSAR-C1, ceSAR-C5,and ceSAR-C100, and ML-based ceSAR-cML1 and ceSAR-cML5, and with a simple baseline method (Baseline) that ignores signature connectivity and accounts for compositional biases in LINCS compound library.
Fig. 6.
Fig. 6.. Enrichment factors for DUD-E targets at 0.1% library.
(A) Fold enrichment into true positives, defined as the ratio of true-positive fraction for full versus reduced library for AutoDock (blue) versus ceSAR-S (red) and consensus methods ceSAR-C1 (yellow) and ceSAR-cML1 (green). (B) The number of targets with ≥5-fold enrichment versus limited or no enrichment (gray) at 0.1% library reduction. Note that ceSAR approaches are more robust and consensus methods ceSAR-C1 and ceSAR-cML1 significantly outperform docking (P = 0.02), reducing the number of targets with limited or no enrichment to four while greatly reducing the computational cost. Statistically significant differences are indicated by arrows with asterisks (*) in (B).
Fig. 7.
Fig. 7.. Experimental validation of in silico candidate compounds targeting BCL2A1.
A total of 116 compounds, identified initially by AutoDock and rescored using ceSAR, along with 23 structural analogs were tested experimentally by FP and DSF. These 139 compounds are shown in (A), with top 20 candidates ranked by AutoDock, ceSAR-S, and ceSAR-C highlighted in blue, red, and yellow, respectively. At a single high dose, several compounds showed inhibition of the BCL2A1-Noxa interaction, including some of the most promising candidates nominated by the consensus ceSAR-C approach. Dose-response FP curves for two candidate compounds, with IC50 values indicated by vertical lines, are shown in (B). These compounds were predicted to bind within the BH3 peptide binding pocket (C and D) in a manner that would drive competitive inhibition of BH3 binding (E). These compounds were demonstrated to induce the death of wild-type, but not bax−/−bak−/− activated T cells, with a box displaying the potential therapeutic window in (F and G).
Fig. 8.
Fig. 8.. Performance of ceSAR-S on compounds directly represented in LINCS versus those with LINCS analogs.
Enrichment into true positives by signature concordance for compounds directly included in LINCS is shown in (A); distributions of concordance scores for true positives (TP) (red) versus true negatives (TN) (blue) included directly in LINCS are shown in (B); and distributions of concordance scores for those with LINCS analogs at Tanimoto coefficient of 0.8 or more are included in (C). ns, not significant; *P ≤ 0.05; **P ≤ 0.01; ***P ≤ 0.001; ****P ≤ 0.0001.

References

    1. Seiler K. P., George G. A., Happ M. P., Bodycombe N. E., Carrinski H. A., Norton S., Brudz S., Sullivan J. P., Muhlich J., Serrano M., Ferraiolo P., Tolliday N. J., Schreiber S. L., Clemons P. A., ChemBank: A small-molecule screening and cheminformatics resource database. Nucleic Acids Res. 36, D351–D359 (2007). - PMC - PubMed
    1. Gaulton A., Bellis L. J., Bento A. P., Chambers J., Davies M., Hersey A., Light Y., McGlinchey S., Michalovich D., Al-Lazikani B., Overington J. P., ChEMBL: A large-scale bioactivity database for drug discovery. Nucleic Acids Res. 40, D1100–D1107 (2012). - PMC - PubMed
    1. Eder J., Sedrani R., Wiesmann C., The discovery of first-in-class drugs: Origins and evolution. Nat. Rev. Drug Discov. 13, 577–587 (2014). - PubMed
    1. Lamb J., Crawford E. D., Peck D., Modell J. W., Blat I. C., Wrobel M. J., Lerner J., Brunet J.-P., Subramanian A., Ross K. N., Reich M., Hieronymus H., Wei G., Armstrong S. A., Haggarty S. J., Clemons P. A., Wei R., Carr S. A., Lander E. S., Golub T. R., The connectivity map: Using gene-expression signatures to connect small molecules, genes, and disease. Science 313, 1929–1935 (2006). - PubMed
    1. Shoemaker R. H., Monks A., Alley M. C., Scudiero D. A., Fine D. L., McLemore T. L., Abbott B. J., Paull K. D., Mayo J. G., Boyd M. R., Development of human tumor cell line panels for use in disease-oriented drug screening. Prog. Clin. Biol. Res. 276, 265–286 (1988). - PubMed

Publication types

LinkOut - more resources