Using Graph Indices for the Analysis and Comparison of Chemical Datasets
- PMID: 27480235
- DOI: 10.1002/minf.201300076
Using Graph Indices for the Analysis and Comparison of Chemical Datasets
Abstract
In cheminformatics, compounds are represented as points in multidimensional space of chemical descriptors. When all pairs of points found within certain distance threshold in the original high dimensional chemistry space are connected by distance-labeled edges, the resulting data structure can be defined as Dataset Graph (DG). We show that, similarly to the conventional description of organic molecules, many graph indices can be computed for DGs as well. We demonstrate that chemical datasets can be effectively characterized and compared by computing simple graph indices such as the average vertex degree or Randic connectivity index. This approach is used to characterize and quantify the similarity between different datasets or subsets of the same dataset (e.g., training, test, and external validation sets used in QSAR modeling). The freely available ADDAGRA program has been implemented to build and visualize DGs. The approach proposed and discussed in this report could be further explored and utilized for different cheminformatics applications such as dataset diversification by acquiring external compounds, dataset processing prior to QSAR modeling, or (dis)similarity modeling of multiple datasets studied in chemical genomics applications.
Keywords: ADDAGRA; Chemical dataset graph; Graph indices; QSAR.
Copyright © 2013 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Similar articles
-
Exploring the QSAR's predictive truthfulness of the novel N-tuple discrete derivative indices on benchmark datasets.SAR QSAR Environ Res. 2017 May;28(5):367-389. doi: 10.1080/1062936X.2017.1326403. SAR QSAR Environ Res. 2017. PMID: 28590848
-
The Development of Novel Chemical Fragment-Based Descriptors Using Frequent Common Subgraph Mining Approach and Their Application in QSAR Modeling.Mol Inform. 2014 Mar;33(3):201-15. doi: 10.1002/minf.201300165. Epub 2014 Mar 11. Mol Inform. 2014. PMID: 27485689
-
Chemical graphs, molecular matrices and topological indices in chemoinformatics and quantitative structure-activity relationships.Curr Comput Aided Drug Des. 2013 Jun;9(2):153-63. doi: 10.2174/1573409911309020002. Curr Comput Aided Drug Des. 2013. PMID: 23701000 Review.
-
Discrete Derivatives for Atom-Pairs as a Novel Graph-Theoretical Invariant for Generating New Molecular Descriptors: Orthogonality, Interpretation and QSARs/QSPRs on Benchmark Databases.Mol Inform. 2014 May;33(5):343-68. doi: 10.1002/minf.201300173. Epub 2014 May 12. Mol Inform. 2014. PMID: 27485891
-
The sum-connectivity index--an additive variant of the Randic connectivity index.Curr Comput Aided Drug Des. 2013 Jun;9(2):184-94. doi: 10.2174/1573409911309020004. Curr Comput Aided Drug Des. 2013. PMID: 23700992 Review.
Cited by
-
An automated framework for QSAR model building.J Cheminform. 2018 Jan 16;10(1):1. doi: 10.1186/s13321-017-0256-5. J Cheminform. 2018. PMID: 29340790 Free PMC article.
-
Are bigger data sets better for machine learning? Fusing single-point and dual-event dose response data for Mycobacterium tuberculosis.J Chem Inf Model. 2014 Jul 28;54(7):2157-65. doi: 10.1021/ci500264r. Epub 2014 Jul 17. J Chem Inf Model. 2014. PMID: 24968215 Free PMC article.
-
Design of chemical space networks using a Tanimoto similarity variant based upon maximum common substructures.J Comput Aided Mol Des. 2015 Oct;29(10):937-50. doi: 10.1007/s10822-015-9872-1. Epub 2015 Sep 29. J Comput Aided Mol Des. 2015. PMID: 26419860
-
HTS navigator: freely accessible cheminformatics software for analyzing high-throughput screening data.Bioinformatics. 2014 Feb 15;30(4):588-9. doi: 10.1093/bioinformatics/btt718. Epub 2013 Dec 28. Bioinformatics. 2014. PMID: 24376084 Free PMC article.
-
Analysis and Comparison of Vector Space and Metric Space Representations in QSAR Modeling.Molecules. 2019 Apr 30;24(9):1698. doi: 10.3390/molecules24091698. Molecules. 2019. PMID: 31052325 Free PMC article.
LinkOut - more resources
Full Text Sources
Other Literature Sources