Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Apr;29(4):473-488.
doi: 10.1261/rna.079497.122. Epub 2023 Jan 24.

Recognizing the power of machine learning and other computational methods to accelerate progress in small molecule targeting of RNA

Affiliations

Recognizing the power of machine learning and other computational methods to accelerate progress in small molecule targeting of RNA

Greta Bagnolini et al. RNA. 2023 Apr.

Abstract

RNA structures regulate a wide range of processes in biology and disease, yet small molecule chemical probes or drugs that can modulate these functions are rare. Machine learning and other computational methods are well poised to fill gaps in knowledge and overcome the inherent challenges in RNA targeting, such as the dynamic nature of RNA and the difficulty of obtaining RNA high-resolution structures. Successful tools to date include principal component analysis, linear discriminate analysis, k-nearest neighbor, artificial neural networks, multiple linear regression, and many others. Employment of these tools has revealed critical factors for selective recognition in RNA:small molecule complexes, predictable differences in RNA- and protein-binding ligands, and quantitative structure activity relationships that allow the rational design of small molecules for a given RNA target. Herein we present our perspective on the value of using machine learning and other computation methods to advance RNA:small molecule targeting, including select examples and their validation as well as necessary and promising future directions that will be key to accelerate discoveries in this important field.

Keywords: RNA; cheminformatics; machine learning; pattern recognition; quantitative structure activity relationships; small molecule.

PubMed Disclaimer

Figures

FIGURE 1.
FIGURE 1.
Schematic representation of the four RNA nucleosides and negatively charged backbone. Curved dashed lines highlight the amine and carbonyl functional groups as sites of hydrogen bonding on heteroaromatic rings.
FIGURE 2.
FIGURE 2.
(A) 3D representation of principal components (PCs) 1, 2, and 3 that plots R-BIND SMs and FDA-approved SMs (adapted with permission from Donlic et al. 2022, © American Chemical Society; (B) Principal moments of inertia (PMI) triangle partitions in four subtriangles, representing rod-like (1), sphere-like (2), disc-like (3) and hybrid shapes (4); example molecules are provided for each shape, (1) NVS-SM1 (Palacino et al. 2015), (2) compound 139 from FDA curated library (Donlic et al. 2022), (3) roseoflavin (Lee et al. 2009), and (4) CP6 (Khan et al. 2019) (adapted with permission from Morgan et al. 2017, © Wiley VCH); the dashed triangle outlines the subtriangle 1 where library averages are located. (C) Schematic representation of the k-nearest neighbor (k-NN) algorithm. The R-BIND SMs, plotted in the chemical space, are used to define nearest neighbors, averaging the smallest distance for each new molecule (adapted with permission from Morgan et al. 2019, © American Chemical Society).
FIGURE 3.
FIGURE 3.
(A) The 16 RNA training set sequences, including stems, bulges (Blg), internal loops (IL), asymmetrical internal loops (AIL), and hairpins (HP) used in PRRSM. BFU-labeled position shown with blue star. (B) Differentiation of the five structural classes of the training set using PCA. (C) Differentiation of the individual training set sequences. PC1 correlated to the increasing motif size (from stem to AIL), while PC2 correlated to the purine: pyrimidine ratio, which is dependent on the sequence of the RNA (HP to IL); (D) PRRSM classification of Pre-Queuosine1 (PreQ1) and fluoride riboswitch conformational changes. Each construct was labeled with BFU in three positions and subjected to the assay. PRRSM was able to classify these RNA structures, including folded and unfolded states, and provide insight into sites that are critical for these structural changes. All PRRSM-based observations of unfolded and folded riboswitch states were confirmed via NMR (adapted with permission from Eubanks et al. 2017, , © American Chemical Society).
FIGURE 4.
FIGURE 4.
(A) Combinatorial modifications at the C5 (blue) and C6 (red) positions of the amiloride scaffold optimized affinity of DMA-1 to give lead DMA-169. Selective ligands showed competitive displacement doses (CD50) of ∼4–200 µM. (B) Linear discriminate analysis (LDA) plot based on 20 cheminformatic parameters clusters selective amiloride derivative ligands from nonbinding and nonselective ligands (panels A and B adapted with permission from Patwardhan et al. 2017, © Royal Society of Chemistry).
FIGURE 5.
FIGURE 5.
(A) Secondary structures of HIV RNAs screened with DMA library. (TAR) Trans-activation response element, (RRE) rev response element, (FSS) frameshift-stimulating, (ESSV) exonic splicing silencer of Vpr. (B) Linear discriminate analysis (LDA) plot based on 20 cheminformatic parameters clusters to differentiate five groups of ligands for HIV RNA targets. (C) LDA loading plot for the qualitative analysis of the contribution of each cheminformatic parameter contributing on F1 versus F2. (MW) Molecular weight, (HBA) number of hydrogen bond acceptors, (HBD) number of hydrogen bond donors, (LogP) n-octanol/water partition coefficient, (RotB) number of rotatable bonds, (tPSA) topological polar surface area, (LogD) n-octanol/water distribution coefficient, (N) number of nitrogen atoms, (O) number of oxygen atoms, (Rings) number of rings, (ArRings) number of aromatic rings, (HetRings) number of heteroatom-containing rings, (SysRings) number of ring systems, (SysRR) ring complexity, (Fsp3) fraction of sp3 hybridized carbons, (ASA) accessible surface area, (relPSA) relative polar surface area, (VWSA) van der Waals surface area (panels AC adapted with permission from Patwardhan et al. 2019b, © Royal Society of Chemistry).
FIGURE 6.
FIGURE 6.
(A) Schematic diagram of MALAT1 triple helix base-pairing and crystal structure. Protein Data Bank entry 4PLX. (B) Diphenylfuran (DPF) and diminazene (DMZ) core scaffold structures. (C) Envelop diagram of the principal moments of inertia (PMI) calculations of the 21-member DMZ-based focus library (panels A and C adapted with permission from Zafferani et al. 2022, © American Chemical Society).
FIGURE 7.
FIGURE 7.
(A) Representative structure of the five scaffolds used in the QSAR model (Cai et al. 2022); graph plotting of observed and predicted ln KD, training set in red, test set in blue, comparing (B) multiple linear regression (MLR), (C) random forest, and (D) gradient boosting machine (adapted with permission from Cai et al. 2022, © American Chemical Society).
FIGURE 8.
FIGURE 8.
Potential workflows to allow the computational prediction of RNA:SM structures and used methods.

Similar articles

Cited by

References

    1. Angermueller C, Lee HJ, Reik W, Stegle O. 2017. DeepCpG: accurate prediction of single-cell DNA methylation states using deep learning. Genome Biol 18: 67. 10.1186/s13059-017-1189-z - DOI - PMC - PubMed
    1. Arnautova YA, Abagyan R, Totrov M. 2018. Protein-RNA docking using ICM. J Chem Theory Comput 14: 4971–4984. 10.1021/acs.jctc.8b00293 - DOI - PubMed
    1. Barnwal RP, Yang F, Varani G. 2017. Applications of NMR to structure determination of RNAs large and small. Arch Biochem Biophys 628: 42–56. 10.1016/j.abb.2017.06.003 - DOI - PMC - PubMed
    1. Bell DR, Cheng SY, Salazar H, Ren P. 2017. Capturing RNA folding free energy with coarse-grained molecular dynamics simulations. Sci Rep 7: 45812. 10.1038/srep45812 - DOI - PMC - PubMed
    1. Bernardes JS, Pedreira CE. 2013. A review of protein function prediction under machine learning perspective. Recent Pat Biotechnol 7: 122–141. 10.2174/18722083113079990006 - DOI - PubMed

Publication types

LinkOut - more resources