Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Mar 6;62(11):e202211358.
doi: 10.1002/anie.202211358. Epub 2023 Feb 6.

Machine Learning Informs RNA-Binding Chemical Space

Affiliations

Machine Learning Informs RNA-Binding Chemical Space

Kamyar Yazdani et al. Angew Chem Int Ed Engl. .

Abstract

Small molecule targeting of RNA has emerged as a new frontier in medicinal chemistry, but compared to the protein targeting literature our understanding of chemical matter that binds to RNA is limited. In this study, we reported Repository Of BInders to Nucleic acids (ROBIN), a new library of nucleic acid binders identified by small molecule microarray (SMM) screening. The complete results of 36 individual nucleic acid SMM screens against a library of 24 572 small molecules were reported (including a total of 1 627 072 interactions assayed). A set of 2 003 RNA-binding small molecules was identified, representing the largest fully public, experimentally derived library of its kind to date. Machine learning was used to develop highly predictive and interpretable models to characterize RNA-binding molecules. This work demonstrates that machine learning algorithms applied to experimentally derived sets of RNA binders are a powerful method to inform RNA-targeted chemical space.

Keywords: Machine Learning; Medicinal Chemistry; Nucleic Acids; RNA; Small Molecule Microarrays.

PubMed Disclaimer

Conflict of interest statement

Conflicts of interest:

The authors declare the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: T.E.H.A. and R.K. are current employees of Ladder Therapeutics Inc. and may hold stock or other financial interests in Ladder Therapeutics Inc.

Figures

Figure 1.
Figure 1.
Comparison of ROBIN RNA binders and FDA-approved drugs. A) Kernel Density Estimate (KDE) plots of distributions of six common medicinal chemistry parameters for FDA-approved drugs (grey), ROBIN RNA binders (orange), and compounds from the SMM library that did not score as hits for any RNA (SMM Non-RNA Binders, blue). B) Receiver Operating Characteristic (ROC) curve for classification of ROBIN RNA binders and FDA-approved drugs using LASSO logistic regression. C) Five features with the highest odds ratios identified by LASSO. D) Five features with the lowest odds ratios identified by LASSO.
Figure 2.
Figure 2.
Classification of ROBIN RNA binders and protein binders. A) Left, TMAP of 10,000 BindingDB protein binders (blue), 2,350 FDA-approved drugs (black), and 2,003 ROBIN RNA binders (orange) encoded with ECFP4 fingerprints. Right, six structures shown from a detail of the TMAP illustrating related molecules on a branch. The blue part of the structures demonstrates the common core shared by the molecules on the branch. B) Precision recall (PR) curve for classification of augmented ROBIN RNA binders and BindingDB protein binders using the multi-layer perceptron (MLP) model. C) Confusion matrix showing the performance of the MLP on the test/holdout set. NPV refers to the negative predictive value. D) Performance metrics for the MLP model on the test/holdout set. E) Left, beeswarm plot illustrating how the top 20 most important features in the MLP impact the model’s output. Right, bar chart showing mean absolute SHAP values for each feature in the beeswarm plot. Each row of the bar chart is aligned with feature rows of the beeswarm plot.
Figure 3.
Figure 3.
Performance of the MLP model on selected known RNA and protein binders. A) Model performance on four known RNA binders not included in the SMM screening library. B) Model performance on four known protein binders. Note that all the protein binders were printed on SMMs and showed low/no binding to RNA targets screened. In each case, the value reported represents probability of RNA binding relative to protein binding as predicted by the MLP model.

References

    1. Meyer SM, Williams CC, Akahori Y, Tanaka T, Aikawa H, Tong Y, Childs-Disney JL, Disney MD, Chem Soc Rev 2020, 49, 7167–7199; - PMC - PubMed
    2. Connelly CM, Moon MH, Schneekloth JS Jr., Cell Chem Biol 2016, 23, 1077–1090; - PMC - PubMed
    3. Umuhire Juru A, Hargrove AE, J Biol Chem 2021, 296, 100191. - PMC - PubMed
    1. Hangauer MJ, Vaughn IW, McManus MT, PLoS Genet 2013, 9, e1003569. - PMC - PubMed
    1. Clamp M, Fry B, Kamal M, Xie X, Cuff J, Lin MF, Kellis M, Lindblad-Toh K, Lander ES, Proc Natl Acad Sci U S A 2007, 104, 19428–19433; - PMC - PubMed
    2. Ezkurdia I, Juan D, Rodriguez JM, Frankish A, Diekhans M, Harrow J, Vazquez J, Valencia A, Tress ML, Hum Mol Genet 2014, 23, 5866–5878. - PMC - PubMed
    1. Esteller M, Nat Rev Genet 2011, 12, 861–874; - PubMed
    2. Lekka E, Hall J, FEBS Lett 2018, 592, 2884–2900. - PMC - PubMed
    1. Hopkins AL, Groom CR, Nat Rev Drug Discov 2002, 1, 727–730; - PubMed
    2. Russ AP, Lampel S, Drug Discov Today 2005, 10, 1607–1610; - PubMed
    3. Dang CV, Reddy EP, Shokat KM, Soucek L, Nat Rev Cancer 2017, 17, 502–508. - PMC - PubMed

Publication types