Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 May 26;65(10):7262-7277.
doi: 10.1021/acs.jmedchem.2c00254. Epub 2022 May 6.

Quantitative Structure-Activity Relationship (QSAR) Study Predicts Small-Molecule Binding to RNA Structure

Affiliations

Quantitative Structure-Activity Relationship (QSAR) Study Predicts Small-Molecule Binding to RNA Structure

Zhengguo Cai et al. J Med Chem. .

Abstract

The diversity of RNA structural elements and their documented role in human diseases make RNA an attractive therapeutic target. However, progress in drug discovery and development has been hindered by challenges in the determination of high-resolution RNA structures and a limited understanding of the parameters that drive RNA recognition by small molecules, including a lack of validated quantitative structure-activity relationships (QSARs). Herein, we develop QSAR models that quantitatively predict both thermodynamic- and kinetic-based binding parameters of small molecules and the HIV-1 transactivation response (TAR) RNA model system. Small molecules bearing diverse scaffolds were screened against TAR using surface plasmon resonance. Multiple linear regression (MLR) combined with feature selection afforded robust models that allowed direct interpretation of the properties critical for both binding strength and kinetic rate constants. These models were validated with new molecules, and their accurate performance was confirmed via comparison to ensemble tree methods, supporting the general applicability of this platform.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing financial interest.

Figures

Figure 1
Figure 1
A. Sequence and structure of 5′ biotinylated HIV-1 TAR and representative chemical structures of the scaffolds used in this work. B. Kinetics map of 48 tested ligands, represented on 10-based logarithmic coordinates. The diagonal lines represent KD values calculated from koff/kon. Units of three parameters are shown. The rest of the study used values based on these units.
Scheme 1
Scheme 1. QSAR Workflow
A. Input molecules were searched for “protomers” and then searched on conformations of each protomer. Molecular descriptors were calculated for each conformation and averaged based on the Boltzmann distribution. B. Small molecules binding HIV-1 TAR were characterized via SPR, and parameters including KD, kon, and koff were fitted globally. C. With representative data splitting and lasso-assisted model searching, the final model was selected based on the performance of the separate test set.
Figure 2
Figure 2
A. Locations of test set molecules in the two-dimensional (2D) chemical space constructed from the first two principal components (29.9 and 20.8% of variances, respectively) of the whole data set. B. Distribution of response variables for the test and training set molecules. C. Chemical structures of the test set molecules (red) selected with the Kennard–Stone algorithm. The closest neighbor molecule in the training set (blue) is shown in pairs for comparison. The similarity was calculated as the Tanimoto coefficient (black) and is listed along the separation line.
Figure 3
Figure 3
A. Coefficients of ln KD descriptors were shrunk as λ increased using lasso regression; each curve with a different color represented a descriptor coefficient shrinkage; the top x-axis showed the number of descriptors with nonzero coefficients at a specific λ value that was indicated by the bottom x-axis. The best λ value (0.01) was determined by the 5-fold cross validation. B. Observed ln KD (both training and test sets) was plotted with the value predicted by the MLR baseline model shown at the top. C. Small molecules from the test set were predicted by MLR of the ln KD value (in red italics) versus the observed values (in blue).
Figure 4
Figure 4
A. Out-of-bag error of random forest model vs number of trees. B. Random forest model of ln KD built with 400 decision trees. C. Squared error loss vs number of iterations in boosting; two methods (out-of-bag method and cross-validation method) were used to determine the best iteration number. D. Boosting model of ln KD.
Figure 5
Figure 5
A. Normal quantile–quantile plots of ln KD model. B. Williams plot showed the applicable domain of ln KD model with training and test sets. C. Model stability test on ln KD data using the formula: ln KD ∼ 1 + PEOE_VSA_POS + vsa_other + vsurf_DW12 + vsruf_ID3. The training and prediction stability are shown on the left and right, respectively. Each bar represented the result from a random sampling, totally 100 times.

Similar articles

Cited by

References

    1. Identification and Analysis of Functional Elements in 1% of the Human Genome by the ENCODE Pilot Project. Nature 2007, 447, 799–816. 10.1038/nature05874. - DOI - PMC - PubMed
    1. Cech T. R.; Steitz J. A. The Noncoding RNA Revolution—Trashing Old Rules to Forge New Ones. Cell 2014, 157, 77–94. 10.1016/j.cell.2014.03.008. - DOI - PubMed
    1. Ji Q.; Zhang L.; Liu X.; Zhou L.; Wang W.; Han Z.; Sui H.; Tang Y.; Wang Y.; Liu N.; Ren J.; Hou F.; Li Q. Long Non-Coding RNA MALAT1 Promotes Tumour Growth and Metastasis in Colorectal Cancer through Binding to SFPQ and Releasing Oncogene PTBP2 from SFPQ/PTBP2 Complex. Br. J. Cancer 2014, 111, 736–748. 10.1038/bjc.2014.383. - DOI - PMC - PubMed
    1. Gupta R. A.; Shah N.; Wang K. C.; Kim J.; Horlings H. M.; Wong D. J.; Tsai M.-C.; Hung T.; Argani P.; Rinn J. L.; Wang Y.; Brzoska P.; Kong B.; Li R.; West R. B.; van de Vijver M. J.; Sukumar S.; Chang H. Y. Long Non-Coding RNA HOTAIR Reprograms Chromatin State to Promote Cancer Metastasis. Nature 2010, 464, 1071–1076. 10.1038/nature08975. - DOI - PMC - PubMed
    1. Esteller M. Non-Coding RNAs in Human Disease. Nat. Rev. Genet. 2011, 12, 861–874. 10.1038/nrg3074. - DOI - PubMed

Publication types