Data-Driven Quantitative Structure-Activity Relationship Modeling for Human Carcinogenicity by Chronic Oral Exposure
- PMID: 37040559
- PMCID: PMC10134506
- DOI: 10.1021/acs.est.3c00648
Data-Driven Quantitative Structure-Activity Relationship Modeling for Human Carcinogenicity by Chronic Oral Exposure
Abstract
Traditional methodologies for assessing chemical toxicity are expensive and time-consuming. Computational modeling approaches have emerged as low-cost alternatives, especially those used to develop quantitative structure-activity relationship (QSAR) models. However, conventional QSAR models have limited training data, leading to low predictivity for new compounds. We developed a data-driven modeling approach for constructing carcinogenicity-related models and used these models to identify potential new human carcinogens. To this goal, we used a probe carcinogen dataset from the US Environmental Protection Agency's Integrated Risk Information System (IRIS) to identify relevant PubChem bioassays. Responses of 25 PubChem assays were significantly relevant to carcinogenicity. Eight assays inferred carcinogenicity predictivity and were selected for QSAR model training. Using 5 machine learning algorithms and 3 types of chemical fingerprints, 15 QSAR models were developed for each PubChem assay dataset. These models showed acceptable predictivity during 5-fold cross-validation (average CCR = 0.71). Using our QSAR models, we can correctly predict and rank 342 IRIS compounds' carcinogenic potentials (PPV = 0.72). The models predicted potential new carcinogens, which were validated by a literature search. This study portends an automated technique that can be applied to prioritize potential toxicants using validated QSAR models based on extensive training sets from public data resources.
Keywords: big data; carcinogens; data mining; machine learning; models; quantitative structure−activity relationships.
Conflict of interest statement
The authors declare no competing financial interest.
Figures
References
-
- Klaschka U. Dangerous Cosmetics - Criteria for Classification, Labelling and Packaging (EC 1272/2008) Applied to Personal Care Products. Environ. Sci. Eur. 2012, 24, 37.10.1186/2190-4715-24-37. - DOI
-
- National Toxicology Program . NTP Toxicology and Carcinogenesis Studies of C.I. Direct Blue 15 (CAS No. 2429-74-5) in F344 Rats (Drinking Water Studies). National Toxicology Program Technical Report Series, 1992; Vol. 397, pp 1–245. - PubMed
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
