Predicting the genotoxicity of secondary and aromatic amines using data subsetting to generate a model ensemble
- PMID: 12767154
- DOI: 10.1021/ci034013i
Predicting the genotoxicity of secondary and aromatic amines using data subsetting to generate a model ensemble
Abstract
Binary quantitative structure-activity relationship (QSAR) models are developed to classify a data set of 334 aromatic and secondary amine compounds as genotoxic or nongenotoxic based on information calculated solely from chemical structure. Genotoxic endpoints for each compound were determined using the SOS Chromotest in both the presence and absence of an S9 rat liver homogenate. Compounds were considered genotoxic if assay results indicated a positive genotoxicity hit for either the S9 inactivated or S9 activated assay. Each compound in the data set was encoded through the calculation of numerical descriptors that describe various aspects of chemical structure (e.g. topological, geometric, electronic, polar surface area). Furthermore, five additional descriptors that focused on the secondary and aromatic nitrogen atoms in each molecule were calculated specifically for this study. Descriptor subsets were examined using a genetic algorithm search engine interfaced with a k-Nearest Neighbor fitness evaluator to find the most information-rich subsets, which ultimately served as the final predictive models. Models were chosen for their ability to minimize the total number of misclassifications, with special attention given to those models that possessed fewer occurrences of positive toxicity hits being misclassified as nontoxic (false negatives). In addition, a subsetting procedure was used to form an ensemble of models using different combinations of compounds in the training and prediction sets. This was done to ensure that consistent results could be obtained regardless of training set composition. The procedure also allowed for each compound to be externally validated three times by different training set data with the resultant predictions being used in a "majority rules" voting scheme to produce a consensus prediction for each member of the data set. The individual models produced an average training set classification rate of 71.6% and an average prediction set classification rate of 67.7%. However, the model ensemble was able to correctly classify the genotoxicity of 72.2% of all prediction set compounds.
Similar articles
-
Predicting the genotoxicity of thiophene derivatives from molecular structure.Chem Res Toxicol. 2003 Jun;16(6):721-32. doi: 10.1021/tx020104i. Chem Res Toxicol. 2003. PMID: 12807355
-
Development of binary classification of structural chromosome aberrations for a diverse set of organic compounds from molecular structure.Chem Res Toxicol. 2003 Feb;16(2):153-63. doi: 10.1021/tx020077w. Chem Res Toxicol. 2003. PMID: 12588186
-
Three new consensus QSAR models for the prediction of Ames genotoxicity.Mutagenesis. 2004 Sep;19(5):365-77. doi: 10.1093/mutage/geh043. Mutagenesis. 2004. PMID: 15388809
-
Comparison of in silico models for prediction of mutagenicity.J Environ Sci Health C Environ Carcinog Ecotoxicol Rev. 2013;31(1):45-66. doi: 10.1080/10590501.2013.763576. J Environ Sci Health C Environ Carcinog Ecotoxicol Rev. 2013. PMID: 23534394 Review.
-
Comparison of QSARs and characterization of structural basis of bioactivity using partial order theory and formal concept analysis: a case study with mutagenicity.Curr Comput Aided Drug Des. 2011 Jun;7(2):109-21. doi: 10.2174/157340911795677639. Curr Comput Aided Drug Des. 2011. PMID: 21542792 Review.
Cited by
-
Reverse fingerprinting, similarity searching by group fusion and fingerprint bit importance.Mol Divers. 2006 Aug;10(3):311-32. doi: 10.1007/s11030-006-9039-z. Epub 2006 Sep 21. Mol Divers. 2006. PMID: 17031535
-
Bioinformatics opportunities for identification and study of medicinal plants.Brief Bioinform. 2013 Mar;14(2):238-50. doi: 10.1093/bib/bbs021. Epub 2012 May 15. Brief Bioinform. 2013. PMID: 22589384 Free PMC article. Review.
-
Novel approach to evolutionary neural network based descriptor selection and QSAR model development.J Comput Aided Mol Des. 2005 Dec;19(12):835-55. doi: 10.1007/s10822-005-9022-2. Epub 2006 Apr 11. J Comput Aided Mol Des. 2005. PMID: 16607572
-
Discovery of a First-in-Class Gut-Restricted RET Kinase Inhibitor as a Clinical Candidate for the Treatment of IBS.ACS Med Chem Lett. 2018 May 24;9(7):623-628. doi: 10.1021/acsmedchemlett.8b00035. eCollection 2018 Jul 12. ACS Med Chem Lett. 2018. PMID: 30034590 Free PMC article.
MeSH terms
Substances
LinkOut - more resources
Full Text Sources