Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2003 Feb-Apr;17(2-4):241-53.
doi: 10.1023/a:1025386326946.

Rational selection of training and test sets for the development of validated QSAR models

Affiliations

Rational selection of training and test sets for the development of validated QSAR models

Alexander Golbraikh et al. J Comput Aided Mol Des. 2003 Feb-Apr.

Abstract

Quantitative Structure-Activity Relationship (QSAR) models are used increasingly to screen chemical databases and/or virtual chemical libraries for potentially bioactive molecules. These developments emphasize the importance of rigorous model validation to ensure that the models have acceptable predictive power. Using k nearest neighbors (kNN) variable selection QSAR method for the analysis of several datasets, we have demonstrated recently that the widely accepted leave-one-out (LOO) cross-validated R2 (q2) is an inadequate characteristic to assess the predictive ability of the models [Golbraikh, A., Tropsha, A. Beware of q2! J. Mol. Graphics Mod. 20, 269-276, (2002)]. Herein, we provide additional evidence that there exists no correlation between the values of q2 for the training set and accuracy of prediction (R2) for the test set and argue that this observation is a general property of any QSAR model developed with LOO cross-validation. We suggest that external validation using rationally selected training and test sets provides a means to establish a reliable QSAR model. We propose several approaches to the division of experimental datasets into training and test sets and apply them in QSAR studies of 48 functionalized amino acid anticonvulsants and a series of 157 epipodophyllotoxin derivatives with antitumor activity. We formulate a set of general criteria for the evaluation of predictive power of QSAR models.

PubMed Disclaimer

References

    1. J Med Chem. 2002 May 23;45(11):2294-309 - PubMed
    1. J Chem Inf Comput Sci. 2000 Jan;40(1):185-94 - PubMed
    1. J Comput Aided Mol Des. 2002 May-Jun;16(5-6):357-69 - PubMed
    1. J Chem Inf Comput Sci. 1998 Mar-Apr;38(2):259-68 - PubMed
    1. J Chem Inf Comput Sci. 2000 Nov-Dec;40(6):1400-7 - PubMed

Publication types

Substances

LinkOut - more resources