Accelerating drug discovery and repurposing by combining transcriptional signature connectivity with docking

Alexander W Thorman¹, James Reigle^{1

2

3}, Somchai Chutipongtanate^{1

4

5}, Juechen Yang^{1

2

3}, Behrouz Shamsaei^{1

5}, Marcin Pilarczyk¹, Mehdi Fazel-Najafabadi¹, Rafal Adamczak⁶, Michal Kouril^{3

7}, Surbhi Bhatnagar^{3

8}, Sarah Hummel⁹, Wen Niu¹, Ardythe L Morrow¹, Maria F Czyzyk-Krzeska^{5

10}, Robert McCullumsmith¹¹, William Seibel⁷, Nicolas Nassar^{7

12}, Yi Zheng^{7

12}, David A Hildeman^{7

9}, Mario Medvedovic^{1

2}, Andrew B Herr^{7

9

13}, Jarek Meller^{1

2

3

6

7

8}

Affiliations

¹ Department of Environmental and Public Health Sciences, University of Cincinnati, Cincinnati, OH, USA.
² Department of Biostatistics, Health Informatics and Data Sciences, University of Cincinnati, Cincinnati, OH, USA.
³ Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA.
⁴ Department of Pediatrics, Faculty of Medicine Ramathibodi Hospital, Mahidol University, Bangkok, Thailand.
⁵ Department of Cancer Biology, University of Cincinnati College of Medicine, Cincinnati, OH, USA.
⁶ Department of Informatics, Faculty of Physics, Astronomy an Informatics, Nicolaus Copernicus University, Toruń, Poland.
⁷ Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH, USA.
⁸ Department of Computer Science, University of Cincinnati, Cincinnati, OH, USA.
⁹ Division of Immunobiology, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA.
¹⁰ Department of Veterans Affairs, Cincinnati Veteran Affairs Medical Center, Cincinnati, OH, USA.
¹¹ Department of Neurosciences, University of Toledo, Toledo, OH, USA.
¹² Division of Experimental Hematology and Cancer Biology, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA.
¹³ Division of Infectious Diseases, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA.

PMID: 39213358
PMCID: PMC11364105
DOI: 10.1126/sciadv.adj3010

Accelerating drug discovery and repurposing by combining transcriptional signature connectivity with docking

Alexander W Thorman et al. Sci Adv. 2024.

. 2024 Aug 30;10(35):eadj3010.

doi: 10.1126/sciadv.adj3010. Epub 2024 Aug 30.

Authors

Affiliations

¹ Department of Environmental and Public Health Sciences, University of Cincinnati, Cincinnati, OH, USA.
² Department of Biostatistics, Health Informatics and Data Sciences, University of Cincinnati, Cincinnati, OH, USA.
³ Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA.
⁴ Department of Pediatrics, Faculty of Medicine Ramathibodi Hospital, Mahidol University, Bangkok, Thailand.
⁵ Department of Cancer Biology, University of Cincinnati College of Medicine, Cincinnati, OH, USA.
⁶ Department of Informatics, Faculty of Physics, Astronomy an Informatics, Nicolaus Copernicus University, Toruń, Poland.
⁷ Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH, USA.
⁸ Department of Computer Science, University of Cincinnati, Cincinnati, OH, USA.
⁹ Division of Immunobiology, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA.
¹⁰ Department of Veterans Affairs, Cincinnati Veteran Affairs Medical Center, Cincinnati, OH, USA.
¹¹ Department of Neurosciences, University of Toledo, Toledo, OH, USA.
¹² Division of Experimental Hematology and Cancer Biology, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA.
¹³ Division of Infectious Diseases, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA.

PMID: 39213358
PMCID: PMC11364105
DOI: 10.1126/sciadv.adj3010

Abstract

We present an in silico approach for drug discovery, dubbed connectivity enhanced structure activity relationship (ceSAR). Building on the landmark LINCS library of transcriptional signatures of drug-like molecules and gene knockdowns, ceSAR combines cheminformatic techniques with signature concordance analysis to connect small molecules and their targets and further assess their biophysical compatibility using molecular docking. Candidate compounds are first ranked in a target structure-independent manner, using chemical similarity to LINCS analogs that exhibit transcriptomic concordance with a target gene knockdown. Top candidates are subsequently rescored using docking simulations and machine learning-based consensus of the two approaches. Using extensive benchmarking, we show that ceSAR greatly reduces false-positive rates, while cutting run times by multiple orders of magnitude and further democratizing drug discovery pipelines. We further demonstrate the utility of ceSAR by identifying and experimentally validating inhibitors of BCL2A1, an important antiapoptotic target in melanoma and preterm birth-associated inflammation.

PubMed Disclaimer

Figures

**Fig. 1.. The overall principle of the ceSAR approach.**
Candidate molecules are first ranked by their chemical similarity to concordant LINCS analogs, i.e., drug-like molecules with transcriptional signatures concordant to a signature of the target gene KD (red box), and subsequently reranked by docking simulations to assess their biophysical complementarity with the target protein (blue box). By combining signature concordance and biophysical complementarity, the library of candidate compounds is reduced to a small subset enriched for true positives (TP) for further validation (yellow box). Here, a fictitious SRC KD signature consists of six genes, with blue indicating down-regulated and yellow indicating up-regulated genes. Signatures of three compounds targeting the EGFR-SRC-JUN cascade are concordant with that of SRC KD, but only the actual SRC inhibitor (green circle) is found to fit the binding pocket by docking.

**Fig. 2.. The hierarchy of ceSAR methods.**
Dependencies and notation for the hierarchy of ceSAR methods introduced and benchmarked in this work (A) and the overall ceSAR-S workflow (B). Note that the methods highlighted in red in (A) do not depend on protein structure. ceSAR-S (and ceSAR-S*) workflow shown in (B) is implemented in sig2lead.net and stand-alone sig2lead application that combine signature connectivity analysis for LINCS compounds with chemical similarity analysis for user defined compounds. The latter is scored on the basis of their chemical similarity to concordant LINCS analogs. Note that sig2lead allows for user defined loss-of-function transcriptional signatures when a LINCS KD signature is not available.

**Fig. 3.. Average speedup (in logarithmic scale) on 20 DUD-E targets for methods relative to AutoDock.**
ceSAR The consensus approaches ceSAR-C₁ (yellow) and ceSAR-cML₁ (green) reduce the run time by 100× compared with docking. Structure-independent ceSAR-S reduces the run time by ~560× when using the fpSim function that represents current methods (see Materials and Methods) to compute the chemical similarity (ceSAR-S:fpSim, dark red) and by ~48,000× when using the ultrafast minSim (ceSAR-S:mSim, red) algorithm introduced in this work.

**Fig. 4.. Top true-positive ranks for 20 DUD-E targets.**
Results for AutoDock (blue), ceSAR-S (red), ceSAR-C₁ (yellow), and ceSAR-cML₁ (green) consensus approaches. Note complementarity of signature connectivity and docking approaches, with ceSAR working well when docking fails for the last five targets (SRC, MK14, DHFR, PPARG, and HSP90), while docking working well when ceSAR fails (Thrombin and PNP). Note also that the consensus methods are more robust and outperform both AutoDock and ceSAR-S in terms of the number of targets with a true positive as the top-ranking candidate or within the top 10 candidates. CDK2, cyclin-dependent kinase 2; COX2, cyclooxygenase 2.

**Fig. 5.. Precision curves for 20 DUD-E targets.**
The median (A) and individual (B) precision curves for 20 DUD-E targets as a function of the library size. AutoDock is compared with ceSAR-S and consensus approaches, ceSAR-C₁, ceSAR-C₅,and ceSAR-C₁₀₀, and ML-based ceSAR-cML₁ and ceSAR-cML₅, and with a simple baseline method (Baseline) that ignores signature connectivity and accounts for compositional biases in LINCS compound library.

**Fig. 6.. Enrichment factors for DUD-E targets at 0.1% library.**
(A) Fold enrichment into true positives, defined as the ratio of true-positive fraction for full versus reduced library for AutoDock (blue) versus ceSAR-S (red) and consensus methods ceSAR-C₁ (yellow) and ceSAR-cML₁ (green). (B) The number of targets with ≥5-fold enrichment versus limited or no enrichment (gray) at 0.1% library reduction. Note that ceSAR approaches are more robust and consensus methods ceSAR-C₁ and ceSAR-cML₁ significantly outperform docking (P = 0.02), reducing the number of targets with limited or no enrichment to four while greatly reducing the computational cost. Statistically significant differences are indicated by arrows with asterisks (*) in (B).

**Fig. 7.. Experimental validation of in silico candidate compounds targeting BCL2A1.**
A total of 116 compounds, identified initially by AutoDock and rescored using ceSAR, along with 23 structural analogs were tested experimentally by FP and DSF. These 139 compounds are shown in (A), with top 20 candidates ranked by AutoDock, ceSAR-S, and ceSAR-C highlighted in blue, red, and yellow, respectively. At a single high dose, several compounds showed inhibition of the BCL2A1-Noxa interaction, including some of the most promising candidates nominated by the consensus ceSAR-C approach. Dose-response FP curves for two candidate compounds, with IC₅₀ values indicated by vertical lines, are shown in (B). These compounds were predicted to bind within the BH3 peptide binding pocket (C and D) in a manner that would drive competitive inhibition of BH3 binding (E). These compounds were demonstrated to induce the death of wild-type, but not *bax^−/−bak^−/−* activated T cells, with a box displaying the potential therapeutic window in (F and G).

**Fig. 8.. Performance of ceSAR-S on compounds directly represented in LINCS versus those with LINCS analogs.**
Enrichment into true positives by signature concordance for compounds directly included in LINCS is shown in (A); distributions of concordance scores for true positives (TP) (red) versus true negatives (TN) (blue) included directly in LINCS are shown in (B); and distributions of concordance scores for those with LINCS analogs at Tanimoto coefficient of 0.8 or more are included in (C). ns, not significant; *P ≤ 0.05; **P ≤ 0.01; ***P ≤ 0.001; ****P ≤ 0.0001.

See this image and copyright information in PMC

References

1. Seiler K. P., George G. A., Happ M. P., Bodycombe N. E., Carrinski H. A., Norton S., Brudz S., Sullivan J. P., Muhlich J., Serrano M., Ferraiolo P., Tolliday N. J., Schreiber S. L., Clemons P. A., ChemBank: A small-molecule screening and cheminformatics resource database. Nucleic Acids Res. 36, D351–D359 (2007). - PMC - PubMed
1. Gaulton A., Bellis L. J., Bento A. P., Chambers J., Davies M., Hersey A., Light Y., McGlinchey S., Michalovich D., Al-Lazikani B., Overington J. P., ChEMBL: A large-scale bioactivity database for drug discovery. Nucleic Acids Res. 40, D1100–D1107 (2012). - PMC - PubMed
1. Eder J., Sedrani R., Wiesmann C., The discovery of first-in-class drugs: Origins and evolution. Nat. Rev. Drug Discov. 13, 577–587 (2014). - PubMed
1. Lamb J., Crawford E. D., Peck D., Modell J. W., Blat I. C., Wrobel M. J., Lerner J., Brunet J.-P., Subramanian A., Ross K. N., Reich M., Hieronymus H., Wei G., Armstrong S. A., Haggarty S. J., Clemons P. A., Wei R., Carr S. A., Lander E. S., Golub T. R., The connectivity map: Using gene-expression signatures to connect small molecules, genes, and disease. Science 313, 1929–1935 (2006). - PubMed
1. Shoemaker R. H., Monks A., Alley M. C., Scudiero D. A., Fine D. L., McLemore T. L., Abbott B. J., Paull K. D., Mayo J. G., Boyd M. R., Development of human tumor cell line panels for use in disease-oriented drug screening. Prog. Clin. Biol. Res. 276, 265–286 (1988). - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
- Atypon
- PubMed Central

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Accelerating drug discovery and repurposing by combining transcriptional signature connectivity with docking

Affiliations

Accelerating drug discovery and repurposing by combining transcriptional signature connectivity with docking

Authors

Affiliations

Abstract

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources