. 2021 Nov 18:4:757780.

doi: 10.3389/frai.2021.757780. eCollection 2021.

DeepCarc: Deep Learning-Powered Carcinogenicity Prediction Using Model-Level Representation

Ting Li^{1

2}, Weida Tong¹, Ruth Roberts^{3

4}, Zhichao Liu¹, Shraddha Thakkar⁵

Affiliations

¹ Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, United States.
² University of Arkansas at Little Rock and University of Arkansas for Medical Sciences Joint Bioinformatics Program, Little Rock, AR, United States.
³ ApconiX Ltd., Alderley Edge, United Kingdom.
⁴ Department of Biosciences, University of Birmingham, Birmingham, United Kingdom.
⁵ Office of Translational Sciences, Center for Drug Evaluation and Research, US Food and Drug Administration, Silver Spring, MD, United States.

PMID: 34870186
PMCID: PMC8636933
DOI: 10.3389/frai.2021.757780

DeepCarc: Deep Learning-Powered Carcinogenicity Prediction Using Model-Level Representation

Ting Li et al. Front Artif Intell. 2021.

. 2021 Nov 18:4:757780.

doi: 10.3389/frai.2021.757780. eCollection 2021.

Authors

Ting Li^{1

2}, Weida Tong¹, Ruth Roberts^{3

4}, Zhichao Liu¹, Shraddha Thakkar⁵

Affiliations

¹ Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, United States.
² University of Arkansas at Little Rock and University of Arkansas for Medical Sciences Joint Bioinformatics Program, Little Rock, AR, United States.
³ ApconiX Ltd., Alderley Edge, United Kingdom.
⁴ Department of Biosciences, University of Birmingham, Birmingham, United Kingdom.
⁵ Office of Translational Sciences, Center for Drug Evaluation and Research, US Food and Drug Administration, Silver Spring, MD, United States.

PMID: 34870186
PMCID: PMC8636933
DOI: 10.3389/frai.2021.757780

Erratum in

Corrigendum: DeepCarc: Deep learning-powered carcinogenicity prediction using model-level representation.
Li T, Tong W, Roberts R, Liu Z, Thakkar S. Li T, et al. Front Artif Intell. 2022 Nov 28;5:1046668. doi: 10.3389/frai.2022.1046668. eCollection 2022. Front Artif Intell. 2022. PMID: 36518910 Free PMC article.

Abstract

Carcinogenicity testing plays an essential role in identifying carcinogens in environmental chemistry and drug development. However, it is a time-consuming and label-intensive process to evaluate the carcinogenic potency with conventional 2-years rodent animal studies. Thus, there is an urgent need for alternative approaches to providing reliable and robust assessments on carcinogenicity. In this study, we proposed a DeepCarc model to predict carcinogenicity for small molecules using deep learning-based model-level representations. The DeepCarc Model was developed using a data set of 692 compounds and evaluated on a test set containing 171 compounds in the National Center for Toxicological Research liver cancer database (NCTRlcdb). As a result, the proposed DeepCarc model yielded a Matthews correlation coefficient (MCC) of 0.432 for the test set, outperforming four advanced deep learning (DL) powered quantitative structure-activity relationship (QSAR) models with an average improvement rate of 37%. Furthermore, the DeepCarc model was also employed to screen the carcinogenicity potential of the compounds from both DrugBank and Tox21. Altogether, the proposed DeepCarc model could serve as an early detection tool (https://github.com/TingLi2016/DeepCarc) for carcinogenicity assessment.

Keywords: NCTRlcdb; QSAR; carcinogenicity; deep learning; non-animal models.

PubMed Disclaimer

Conflict of interest statement

RR is co-founder and co-director of ApconiX, an integrated toxicology and ion channel company that provides expert advice on non-clinical aspects of drug discovery and drug development to academia, industry, and not-for-profit organizations. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

**FIGURE 1**
Overall workflow for the DeepCarc model including: (1) Data preparation. 863 compounds were split into training (554 compounds), development (138 compounds), and test (171 compounds) sets based on the Kennard-stone algorithm. (2) Base classifiers development. Five algorithms were used to develop the base classifiers from three different chemical representations, including Mol2vec, Mold2, and MACCS. Two base classifiers selection strategies were employed to select the optimized classifiers for meta classifier development. (3) Meta classifier development. With three chemical representations and two selection methods, six groups of base classifiers, including Mol2vec_supervised, Mol2vec_original, Mold2_supervised, were used Mold2_original, MACCS_supervised, and MACCS_original. The probability prediction from selected base classifiers was used to train the neural network. (4) Model evaluation. The DeepCarc model was evaluated on the independent test set.

**FIGURE 2**
The distribution of the pairwise Tanimoto coefficients calculated from Mol2vec, Mold2, and MACCS: The pink and green indicate that the pairwise Tanimoto coefficients were calculated from the carcinogenic molecules and noncarcinogenic molecules, respectively.

**FIGURE 3**
The performance of the developed DeepCarc models based on the proposed supervised base classifier selection strategy with the three chemical representations: the three chemical representations included Mol2vec, Mold2, and MACCS. **(A)**: Seven performance metrics; **(B)**: Area under the ROC curve.

**FIGURE 4**
Ensemble models performance on the test set.

**FIGURE 5**
The probability distribution of the DeepCarc prediction of the compounds from **(A)** DrugBank; **(B)** Tox21.

See this image and copyright information in PMC

Cited by

Advances of Artificial Intelligence in Anti-Cancer Drug Design: A Review of the Past Decade.
Wang L, Song Y, Wang H, Zhang X, Wang M, He J, Li S, Zhang L, Li K, Cao L. Wang L, et al. Pharmaceuticals (Basel). 2023 Feb 7;16(2):253. doi: 10.3390/ph16020253. Pharmaceuticals (Basel). 2023. PMID: 37259400 Free PMC article. Review.
Four functional genotoxic marker genes (Bax, Btg2, Ccng1, and Cdkn1a) discriminate genotoxic hepatocarcinogens from non-genotoxic hepatocarcinogens and non-genotoxic non-hepatocarcinogens in rat public toxicogenomics data, Open TG-GATEs.
Furihata C, Suzuki T. Furihata C, et al. Genes Environ. 2024 Dec 19;46(1):28. doi: 10.1186/s41021-024-00322-8. Genes Environ. 2024. PMID: 39702344 Free PMC article.
Recent advances in AI-based toxicity prediction for drug discovery.
Lee H, Kim J, Kim JW, Lee Y. Lee H, et al. Front Chem. 2025 Jul 8;13:1632046. doi: 10.3389/fchem.2025.1632046. eCollection 2025. Front Chem. 2025. PMID: 40698059 Free PMC article. Review.
Short-term in vivo testing to discriminate genotoxic carcinogens from non-genotoxic carcinogens and non-carcinogens using next-generation RNA sequencing, DNA microarray, and qPCR.
Furihata C, Suzuki T. Furihata C, et al. Genes Environ. 2023 Feb 9;45(1):7. doi: 10.1186/s41021-023-00262-9. Genes Environ. 2023. PMID: 36755350 Free PMC article. Review.
Role of artificial intelligence in revolutionizing drug discovery.
Rehman AU, Li M, Wu B, Ali Y, Rasheed S, Shaheen S, Liu X, Luo R, Zhang J. Rehman AU, et al. Fundam Res. 2024 May 9;5(3):1273-1287. doi: 10.1016/j.fmre.2024.04.021. eCollection 2025 May. Fundam Res. 2024. PMID: 40528990 Free PMC article. Review.

See all "Cited by" articles

References

1. Bajusz D., Rácz A., Héberger K. (2015). Why Is Tanimoto index an Appropriate Choice for Fingerprint-Based Similarity Calculations? J. Cheminform 7, 20–13. 10.1186/s13321-015-0069-3 - DOI - PMC - PubMed
1. Becht E., McInnes L., Healy J., Dutertre C.-A., Kwok I. W. H., Ng L. G., et al. (2019). Dimensionality Reduction for Visualizing Single-Cell Data Using UMAP. Nat. Biotechnol. 37, 38–44. 10.1038/nbt.4314 - DOI - PubMed
1. Beger R. D., Young J. F., Fang H. (2004). Discriminant Function Analyses of Liver-specific Carcinogens. J. Chem. Inf. Comput. Sci. 44, 1107–1110. 10.1021/ci0342829 - DOI - PubMed
1. Benigni R., Passerini L. (2002). Carcinogenicity of the Aromatic Amines: from Structure-Activity Relationships to Mechanisms of Action and Risk Assessment. Mutat. Research/Reviews Mutat. Res. 511, 191–206. 10.1016/s1383-5742(02)00008-x - DOI - PubMed
1. Breiman L. (1996). Bagging Predictors. Mach Learn. 24, 123–140. 10.1007/bf00058655 - DOI

LinkOut - more resources

Full Text Sources
Research Materials
- NCI CPTC Antibody Characterization Program

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

DeepCarc: Deep Learning-Powered Carcinogenicity Prediction Using Model-Level Representation

Affiliations

DeepCarc: Deep Learning-Powered Carcinogenicity Prediction Using Model-Level Representation

Authors

Affiliations

Erratum in

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

LinkOut - more resources

Full Text Sources

Research Materials