Using Graph Indices for the Analysis and Comparison of Chemical Datasets
- PMID: 27480235
- DOI: 10.1002/minf.201300076
Using Graph Indices for the Analysis and Comparison of Chemical Datasets
Abstract
In cheminformatics, compounds are represented as points in multidimensional space of chemical descriptors. When all pairs of points found within certain distance threshold in the original high dimensional chemistry space are connected by distance-labeled edges, the resulting data structure can be defined as Dataset Graph (DG). We show that, similarly to the conventional description of organic molecules, many graph indices can be computed for DGs as well. We demonstrate that chemical datasets can be effectively characterized and compared by computing simple graph indices such as the average vertex degree or Randic connectivity index. This approach is used to characterize and quantify the similarity between different datasets or subsets of the same dataset (e.g., training, test, and external validation sets used in QSAR modeling). The freely available ADDAGRA program has been implemented to build and visualize DGs. The approach proposed and discussed in this report could be further explored and utilized for different cheminformatics applications such as dataset diversification by acquiring external compounds, dataset processing prior to QSAR modeling, or (dis)similarity modeling of multiple datasets studied in chemical genomics applications.
Keywords: ADDAGRA; Chemical dataset graph; Graph indices; QSAR.
Copyright © 2013 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
LinkOut - more resources
Full Text Sources
Other Literature Sources
