Substructure mining using elaborate chemical representation
- PMID: 16562988
- DOI: 10.1021/ci0503715
Substructure mining using elaborate chemical representation
Abstract
Substructure mining algorithms are important drug discovery tools since they can find substructures that affect physicochemical and biological properties. Current methods, however, only consider a part of all chemical information that is present within a data set of compounds. Therefore, the overall aim of our study was to enable more exhaustive data mining by designing methods that detect all substructures of any size, shape, and level of chemical detail. A means of chemical representation was developed that uses atomic hierarchies, thus enabling substructure mining to consider general and/or highly specific features. As a proof-of-concept, the efficient, multipurpose graph mining system Gaston learned substructures of any size and shape from a mutagenicity data set that was represented in this manner. From these substructures, we extracted a set of only six nonredundant, discriminative substructures that represent relevant biochemical knowledge. Our results demonstrate the individual and synergistic importance of elaborate chemical representation and mining for nonlinear substructures. We conclude that the combination of elaborate chemical representation and Gaston provides an excellent method for 2D substructure mining as this recipe systematically explores all substructures in different levels of chemical detail.
Similar articles
-
Derivation and validation of toxicophores for mutagenicity prediction.J Med Chem. 2005 Jan 13;48(1):312-20. doi: 10.1021/jm040835a. J Med Chem. 2005. PMID: 15634026
-
Identification of Nontoxic Substructures: A New Strategy to Avoid Potential Toxicity Risk.Toxicol Sci. 2018 Oct 1;165(2):396-407. doi: 10.1093/toxsci/kfy146. Toxicol Sci. 2018. PMID: 29893961
-
Ring systems in mutagenicity databases.J Med Chem. 2005 Oct 20;48(21):6671-8. doi: 10.1021/jm050564j. J Med Chem. 2005. PMID: 16220983
-
Mining chemical structural information from the drug literature.Drug Discov Today. 2006 Jan;11(1-2):35-42. doi: 10.1016/S1359-6446(05)03682-2. Drug Discov Today. 2006. PMID: 16478689 Review.
-
[Development of antituberculous drugs: current status and future prospects].Kekkaku. 2006 Dec;81(12):753-74. Kekkaku. 2006. PMID: 17240921 Review. Japanese.
Cited by
-
Automated detection of structural alerts (chemical fragments) in (eco)toxicology.Comput Struct Biotechnol J. 2013 Apr 6;5:e201302013. doi: 10.5936/csbj.201302013. eCollection 2013. Comput Struct Biotechnol J. 2013. PMID: 24688706 Free PMC article. Review.
-
Online Prioritization of Toxic Compounds in Water Samples through Intelligent HRMS Data Acquisition.Anal Chem. 2021 Mar 30;93(12):5071-5080. doi: 10.1021/acs.analchem.0c04473. Epub 2021 Mar 16. Anal Chem. 2021. PMID: 33724776 Free PMC article.
-
In Silico Prediction of Chemical Toxicity for Drug Design Using Machine Learning Methods and Structural Alerts.Front Chem. 2018 Feb 20;6:30. doi: 10.3389/fchem.2018.00030. eCollection 2018. Front Chem. 2018. PMID: 29515993 Free PMC article. Review.
-
Open Babel: An open chemical toolbox.J Cheminform. 2011 Oct 7;3:33. doi: 10.1186/1758-2946-3-33. J Cheminform. 2011. PMID: 21982300 Free PMC article.
-
Fragment-based prediction of skin sensitization using recursive partitioning.J Comput Aided Mol Des. 2011 Sep;25(9):885-93. doi: 10.1007/s10822-011-9472-7. Epub 2011 Sep 20. J Comput Aided Mol Des. 2011. PMID: 21932057
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Medical
Research Materials