Prediction of carcinogenicity for diverse chemicals based on substructure grouping and SVM modeling
- PMID: 20186479
- DOI: 10.1007/s11030-010-9232-y
Prediction of carcinogenicity for diverse chemicals based on substructure grouping and SVM modeling
Abstract
The Carcinogenicity Reliability Database (CRDB) was constructed by collecting experimental carcinogenicity data on about 1,500 chemicals from six sources, including IARC, and NTP databases, and then by ranking their reliabilities into six unified categories. A wide variety of 911 organic chemicals were selected from the database for QSAR modeling, and 1,504 kinds of different molecular descriptors were calculated, based on their 3D molecular structures as modeled by the Dragon software. Positive (carcinogenic) and negative (non-carcinogenic) chemicals containing various substructures were counted using atom and functional group count descriptors, and the statistical significance of ratios of positives to negatives was tested for those substructures. Very few were judged to be strongly related to carcinogenicity, among substructures known to be responsible for carcinogens as revealed from biomedical studies. In order to develop QSAR models for the prediction of the carcinogenicities of a wide variety of chemicals with a satisfactory performance level, the relationship between the carcinogenicity data with improved reliability and a subset of significant descriptors selected from 1,504 Dragon descriptors was analyzed with a support vector machine (SVM) method: the classification function (SVC) for weighted data in LIBSVM program was used to classify chemicals into two carcinogenic categories (positive or negative), where weights were set depending on the reliabilities of the carcinogenicity data. The quality and stability of the models presented were tested by performing a dual cross-validation procedure. A single SVM model as the first step was developed for all the 911 chemicals using 250 selected descriptors, achieving an overall accuracy level, i.e., positive and negative correct estimate, of about 70%. In order to improve the accuracy of the final model, the 911 chemicals were classified into 20 mutually overlapping subgroups according to contained substructures, a specific SVM model was optimized for each subgroup, and the predicted carcinogenicities of the 911 chemicals were determined by the majorities of the outputs of the corresponding SVM models. The model developed on the basis of grouping of chemicals into 20 substructures predicts the carcinogenicities of a wide variety of chemicals with a satisfactory overall accuracy of approximately 80%.
Similar articles
-
Prediction of chemical carcinogenicity by machine learning approaches.SAR QSAR Environ Res. 2009;20(1-2):27-75. doi: 10.1080/10629360902724085. SAR QSAR Environ Res. 2009. PMID: 19343583
-
Improvement of carcinogenicity prediction performances based on sensitivity analysis in variable selection of SVM models.SAR QSAR Environ Res. 2013;24(7):565-80. doi: 10.1080/1062936X.2012.762425. Epub 2013 Jan 25. SAR QSAR Environ Res. 2013. PMID: 23350528
-
Application of a developed triple-classification machine learning model for carcinogenic prediction of hazardous organic chemicals to the US, EU, and WHO based on Chinese database.Ecotoxicol Environ Saf. 2023 Apr 15;255:114806. doi: 10.1016/j.ecoenv.2023.114806. Epub 2023 Mar 20. Ecotoxicol Environ Saf. 2023. PMID: 36948010
-
The comet assay with multiple mouse organs: comparison of comet assay results and carcinogenicity with 208 chemicals selected from the IARC monographs and U.S. NTP Carcinogenicity Database.Crit Rev Toxicol. 2000 Nov;30(6):629-799. doi: 10.1080/10408440008951123. Crit Rev Toxicol. 2000. PMID: 11145306 Review.
-
International Commission for Protection Against Environmental Mutagens and Carcinogens. Application of SAR methods to non-congeneric data bases associated with carcinogenicity and mutagenicity: issues and approaches.Mutat Res. 1994 Feb 1;305(1):73-97. doi: 10.1016/0027-5107(94)90127-9. Mutat Res. 1994. PMID: 7508549 Review.
Cited by
-
High-Dimensional descriptor selection and computational QSAR modeling for antitumor activity of ARC-111 analogues Based on Support Vector Regression (SVR).Int J Mol Sci. 2012;13(1):1161-1172. doi: 10.3390/ijms13011161. Epub 2012 Jan 20. Int J Mol Sci. 2012. PMID: 22312310 Free PMC article.
-
A clinical risk stratification tool for predicting treatment resistance in major depressive disorder.Biol Psychiatry. 2013 Jul 1;74(1):7-14. doi: 10.1016/j.biopsych.2012.12.007. Epub 2013 Feb 4. Biol Psychiatry. 2013. PMID: 23380715 Free PMC article.
-
DCAMCP: A deep learning model based on capsule network and attention mechanism for molecular carcinogenicity prediction.J Cell Mol Med. 2023 Oct;27(20):3117-3126. doi: 10.1111/jcmm.17889. Epub 2023 Jul 31. J Cell Mol Med. 2023. PMID: 37525507 Free PMC article.
-
Predicting Dose-Range Chemical Toxicity using Novel Hybrid Deep Machine-Learning Method.Toxics. 2022 Nov 18;10(11):706. doi: 10.3390/toxics10110706. Toxics. 2022. PMID: 36422913 Free PMC article.
-
Which is a more accurate predictor in colorectal survival analysis? Nine data mining algorithms vs. the TNM staging system.PLoS One. 2012;7(7):e42015. doi: 10.1371/journal.pone.0042015. Epub 2012 Jul 25. PLoS One. 2012. PMID: 22848691 Free PMC article.
References
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources