Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012 Aug 27;52(8):2181-91.
doi: 10.1021/ci300047k. Epub 2012 Aug 16.

Exploring uncharted territories: predicting activity cliffs in structure-activity landscapes

Affiliations

Exploring uncharted territories: predicting activity cliffs in structure-activity landscapes

Rajarshi Guha. J Chem Inf Model. .

Abstract

The notion of activity cliffs is an intuitive approach to characterizing structural features that play a key role in modulating biological activity of a molecule. A variety of methods have been described to quantitatively characterize activity cliffs, such as SALI and SARI. However, these methods are primarily retrospective in nature; highlighting cliffs that are already present in the data set. The current study focuses on employing a pairwise characterization of a data set to train a model to predict whether a new molecule will exhibit an activity cliff with one or more members of the data set. The approach is based on predicting a value for pairs of objects rather than the individual objects themselves (and thus allows for robust models even for small structure-activity relationship data sets). We extracted structure-activity data for several ChEMBL assays and developed random forest models to predict SALI values, from pairwise combinations of molecular descriptors. The models exhibited reasonable RMSE's though, surprisingly, performance on the more significant cliffs tended to be better than on the lesser ones. While the models do not exhibit very high levels of accuracy, our results indicate that they are able to prioritize molecules in terms of their ability to activity cliffs, thus serving as a tool to prospectively identify activity cliffs.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Distribution of Pearson correlations between SALI values and the descriptors for the Cavalli dataset. Three descriptor aggregation functions are considered.
Figure 2
Figure 2
Plots of predicted versus observed SALI values obtained using random forest models, on the Cavalli dataset. Each panel corresponds to the use of a different descriptor aggregation function.
Figure 3
Figure 3
A comparison of the pairwise descriptor distribution for low and high SALI values in the Cavalli dataset. This figure employed the descriptor values generated using the fdiff aggregation function and Euclidean distances were evaluated using all the descriptors in the pool. The low group is defined as those observations with SALI less than 2.03 and the remainder are assigned to the high group.
Figure 4
Figure 4
Predicted versus observed SALI values, obtained from random forest models for the three ChEMBL datasets. The three plots correspond to the two different aggregation functions (fdiff and fmean respectively).
Figure 5
Figure 5
Results of random forest models developed using the three ChEMBL dataset. In each case the model was built using the log10 of the observed activity.
Figure 6
Figure 6
Detailed analysis of SALI predictions for the Kalla dataset. A - a plot of predicted versus observed log(SALI) values for the training set and the hold out set. B - a summary of the training set, which plots the structural difference versus the logarithm of the ratio of the activities for each pair of molecules in the prediction set. Points are shaded by their absolute residual. C - a box plot summarizing the distribution of residuals associated with predictions from each of the hold out molecules.
Figure 7
Figure 7
A summary of the prediction residuals for log(SALI) values, grouped by whether the actual log(SALI) value for that observation was low, medium or high. The grouping is based on the quartiles of the observed values.
Figure 8
Figure 8
Three of the hold out molecules for the Kalla dataset and training set members with which the hold outs exhibit predicted activity cliffs. Bold numbers are ChEMBL MOLREGNO values and numbers in parentheses are the absolute prediction residual in log(SALI) units.
Figure 9
Figure 9
Detailed analysis of SALI predictions for the Dai dataset. A - a plot of predicted versus observed log(SALI) values for the training set and the hold out set. B - a summary of the training set, which plots the structural difference versus the logarithm of the ratio of the activities for each pair of molecules in the prediction set. Points are shaded by their absolute residual. C - a box plot summarizing the distribution of residuals associated with predictions from each of the hold out molecules.
Figure 10
Figure 10
Three of the hold out molecules for the Dai dataset and training set members with which the hold outs exhibit predicted activity cliffs. Bold numbers are ChEMBL MOLREGNO values and numbers in parentheses are the absolute prediction residual in log(SALI) units.
Figure 11
Figure 11
Distribution of observed log(SALI) values for the three ChEMBL datasets.

Similar articles

Cited by

References

    1. Johnson M, Maggiora G. Concepts and Applications of Molecular Similarity; John Wiley & Sons; New York: 1990.
    1. Maggiora GM. On Outliers and Activity Cliffs–Why QSAR Often Disappoints. J Chem Inf Model. 2006;46:1535–1535. - PubMed
    1. Leach A, Jones H, Cosgrove D, Kenny P, Ruston L, MacFaul P, Wood J, Col-clough N, Law B. Matched Molecular Pairs as a Guide in the Optimization of Pharmaceutical Properties; a Study of Aqueous Solubility, Plasma Protein Binding and Oral Exposure. J Med Chem. 2006;49:6672–6682. - PubMed
    1. Shanmugasundaram V, Maggiora G. Characterizing Property and Activity Landscapes Using an Information-Theoretic Approach. CINF-032. 222nd ACS National Meeting; Chicago, IL, United States. Washington, D.C: American Chemical Society; 2001.
    1. Guha R, Van Drie J. The Structure-Activity Landscape Index: Identifying and Quantifying Activity-Cliffs. J Chem Inf Model. 2008;48:646–658. - PubMed