Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2006 Sep-Oct;46(5):1984-95.
doi: 10.1021/ci060132x.

A novel automated lazy learning QSAR (ALL-QSAR) approach: method development, applications, and virtual screening of chemical databases using validated ALL-QSAR models

Affiliations

A novel automated lazy learning QSAR (ALL-QSAR) approach: method development, applications, and virtual screening of chemical databases using validated ALL-QSAR models

Shuxing Zhang et al. J Chem Inf Model. 2006 Sep-Oct.

Abstract

A novel automated lazy learning quantitative structure-activity relationship (ALL-QSAR) modeling approach has been developed on the basis of the lazy learning theory. The activity of a test compound is predicted from a locally weighted linear regression model using chemical descriptors and the biological activity of the training set compounds most chemically similar to this test compound. The weights with which training set compounds are included in the regression depend on the similarity of those compounds to a test compound. We have applied the ALL-QSAR method to several experimental chemical data sets including 48 anticonvulsant agents with known ED50 values, 48 dopamine D1-receptor antagonists with known competitive binding affinities (Ki), and a Tetrahymena pyriformis data set containing 250 phenolic compounds with toxicity IGC50 values. When applied to database screening, models developed for anticonvulsant agents identified several known anticonvulsant compounds that were not only absent in the training set but highly chemically dissimilar to the training set compounds. This initial success indicates that ALL-QSAR can be further exploited as a general tool for accurate bioactivity prediction and database screening in drug design and discovery. Because of its local nature, the ALL-QSAR approach appears to be especially well-suited for the development of highly predictive models for the sparse or unevenly distributed data sets.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Locally weighted regression. The Figure highlights the difference between the global linear regression and the locally weighted linear regression. The green line is the global linear regression and the red straight line is the weighted linear regression, where the thickness of gray lines indicates the strength of the weight. The red curve line is the final function obtained after combining local linear regressions for all the points.
Figure 2
Figure 2
Flowchart of the ALL-QSAR method.
Figure 3
Figure 3
The ALL-QSAR statistical modeling workflow.
Figure 4
Figure 4
Flowchart of database mining that employs predictive ALL-QSAR models.
Figure 5
Figure 5
The correlation between the ridge regression parameter (λ) and the R2 for one of the Phenol test sets.
Figure 6
Figure 6
R2 trajectory with respect to the kernel width during the model development for 39 anticonvulsant agents in the training set and 9 compounds in the test set. Iterations are shown for the real dataset (black) and the dataset with activity randomized (gray).
Figure 7
Figure 7
Activity prediction with ALL-QSAR models for 9 anticonvulsants in the test set. R2 = 0.90 (Model 1 in Table 1).
Figure 8
Figure 8
Activity prediction with ALL-QSAR models for 14 anticonvulsants in the test set. R2 = 0.76 (Model 8 in Table 1).
Figure 9
Figure 9
Correlation between experimental and predicted pKi for 11 D1 antagonists in the test set. Training set included 37 compounds. R2 = 0.97 (Model 1 in Table 2)
Figure 10
Figure 10
Correlation between experimental and predicted pKi for 14 D1 antagonists in the test set. Training set included 32 compounds. R2 = 0.87 (Model 4 in Table 2). Two compounds, Ant08 and NNC01-0127, are outside of the applicability domain and not shown in the plot.
Figure 11
Figure 11
The best ALL-QSAR model with 150 phenols in the training set: R2 = 0.90 for the prediction of 50 compounds in the test set (Model 1 in Table 3).
Figure 12
Figure 12
The consensus prediction of 50 external toxic phenol compounds with the 10 best ALL-QSAR models affords high accuracy of prediction with R2 = 0.86 (Table 3 and 4).
Figure 13
Figure 13
Workflow for the identification of novel anticonvulsant agents using consensus database mining.
Figure 14
Figure 14
One of the structures identified in virtual screening (top) and Dimmock’s semicarbazone scaffold (bottom).

Similar articles

Cited by

References

    1. Dietrich SW, Dreyer ND, Hansch C, Bentley DL. Confidence-Interval Estimators for Parameters Associated with Quantitative Structure-Activity-Relationships. J Med Chem. 1980;23:1201–1205. - PubMed
    1. Hadjipavloulitina D, Hansch C. Quantitative Structure-Activity-Relationships of the Benzodiazepines - A Review and Reevaluation. Chem Rev. 1994;94:1483–1505.
    1. Hansch C, Muir RM, Fujita T, Maloney PP, Geiger E, Streich M. The Correlation of Biological Activity of Plant Growth Regulators and Chloromycetin Derivatives with Hammett Constants and Partition Coefficients. J Am Chem Soc. 1963;85:2817–2824.
    1. Hansch C, Kurup A, Garg R, Gao H. Chem-bioinformatics and QSAR: A review of QSAR lacking positive hydrophobic terms. Chem Rev. 2001;101:619–672. - PubMed
    1. Hansch C, Leo A, Mekapati SB, Kurup A. Qsar and Adme. Bioorg Med Chem. 2004;12:3391–3400. - PubMed

Publication types

MeSH terms