Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2000 Jan;40(1):185-94.
doi: 10.1021/ci980033m.

Novel variable selection quantitative structure--property relationship approach based on the k-nearest-neighbor principle

Affiliations

Novel variable selection quantitative structure--property relationship approach based on the k-nearest-neighbor principle

W Zheng et al. J Chem Inf Comput Sci. 2000 Jan.

Abstract

A novel automated variable selection quantitative structure-activity relationship (QSAR) method, based on the kappa-nearest neighbor principle (kNN-QSAR) has been developed. The kNN-QSAR method explores formally the active analogue approach, which implies that similar compounds display similar profiles of pharmacological activities. The activity of each compound is predicted as the average activity of K most chemically similar compounds from the data set. The robustness of a QSAR model is characterized by the value of cross-validated R2 (q2) using the leave-one-out cross-validation method. The chemical structures are characterized by multiple topological descriptors such as molecular connectivity indices or atom pairs. The chemical similarity is evaluated by Euclidean distances between compounds in multidimensional descriptor space, and the optimal subset of descriptors is selected using simulated annealing as a stochastic optimization algorithm. The application of the kNN-QSAR method to 58 estrogen receptor ligands as well as to several other groups of pharmacologically active compounds yielded QSAR models with q2 values of 0.6 or higher. Due to its relative simplicity, high degree of automation, nonlinear nature, and computational efficiency, this method could be applied routinely to a large variety of experimental data sets.

PubMed Disclaimer