. 2020 Nov 5;10(1):19128.

doi: 10.1038/s41598-020-76129-8.

Identification of early liver toxicity gene biomarkers using comparative supervised machine learning

Brandi Patrice Smith^{1

2}, Loretta Sue Auvil³, Michael Welge^{3

4}, Colleen Bannon Bushell^{3

4

5}, Rohit Bhargava^{6

7

8}, Navin Elango⁹, Kamin Johnson⁹, Zeynep Madak-Erdogan^{10

11

12

13

14}

Affiliations

¹ Department of Food Science and Human Nutrition, University of Illinois, 1201 W Gregory Dr, Urbana-ChampaignUrbana, IL, 61801, USA.
² Illinois Informatics Institute, University of Illinois, Urbana-Champaign, Urbana, IL, USA.
³ National Center for Supercomputing Applications, University of Illinois, Urbana-Champaign, Urbana, IL, USA.
⁴ Carl R. Woese Institute for Genomic Biology, University of Illinois, Urbana-Champaign, Urbana, IL, USA.
⁵ Carle Illinois College of Medicine, University of Illinois, Urbana-Champaign, Urbana, IL, USA.
⁶ Department of Bioengineering, University of Illinois, Urbana-Champaign, Urbana, IL, USA.
⁷ Cancer Center at Illinois, University of Illinois, Urbana-Champaign, Urbana, IL, USA.
⁸ Beckman Institute for Advanced Science and Technology, University of Illinois at Urbana-Champaign, Urbana, IL, USA.
⁹ Corteva Agrisciences, The Agriculture Division of DowDupont, Indianapolis, IN, USA.
¹⁰ Department of Food Science and Human Nutrition, University of Illinois, 1201 W Gregory Dr, Urbana-ChampaignUrbana, IL, 61801, USA. zmadake2@illinois.edu.
¹¹ National Center for Supercomputing Applications, University of Illinois, Urbana-Champaign, Urbana, IL, USA. zmadake2@illinois.edu.
¹² Carl R. Woese Institute for Genomic Biology, University of Illinois, Urbana-Champaign, Urbana, IL, USA. zmadake2@illinois.edu.
¹³ Cancer Center at Illinois, University of Illinois, Urbana-Champaign, Urbana, IL, USA. zmadake2@illinois.edu.
¹⁴ Beckman Institute for Advanced Science and Technology, University of Illinois at Urbana-Champaign, Urbana, IL, USA. zmadake2@illinois.edu.

PMID: 33154507
PMCID: PMC7645727
DOI: 10.1038/s41598-020-76129-8

Identification of early liver toxicity gene biomarkers using comparative supervised machine learning

Brandi Patrice Smith et al. Sci Rep. 2020.

. 2020 Nov 5;10(1):19128.

doi: 10.1038/s41598-020-76129-8.

Authors

Affiliations

¹ Department of Food Science and Human Nutrition, University of Illinois, 1201 W Gregory Dr, Urbana-ChampaignUrbana, IL, 61801, USA.
² Illinois Informatics Institute, University of Illinois, Urbana-Champaign, Urbana, IL, USA.
³ National Center for Supercomputing Applications, University of Illinois, Urbana-Champaign, Urbana, IL, USA.
⁴ Carl R. Woese Institute for Genomic Biology, University of Illinois, Urbana-Champaign, Urbana, IL, USA.
⁵ Carle Illinois College of Medicine, University of Illinois, Urbana-Champaign, Urbana, IL, USA.
⁶ Department of Bioengineering, University of Illinois, Urbana-Champaign, Urbana, IL, USA.
⁷ Cancer Center at Illinois, University of Illinois, Urbana-Champaign, Urbana, IL, USA.
⁸ Beckman Institute for Advanced Science and Technology, University of Illinois at Urbana-Champaign, Urbana, IL, USA.
⁹ Corteva Agrisciences, The Agriculture Division of DowDupont, Indianapolis, IN, USA.
¹⁰ Department of Food Science and Human Nutrition, University of Illinois, 1201 W Gregory Dr, Urbana-ChampaignUrbana, IL, 61801, USA. zmadake2@illinois.edu.
¹¹ National Center for Supercomputing Applications, University of Illinois, Urbana-Champaign, Urbana, IL, USA. zmadake2@illinois.edu.
¹² Carl R. Woese Institute for Genomic Biology, University of Illinois, Urbana-Champaign, Urbana, IL, USA. zmadake2@illinois.edu.
¹³ Cancer Center at Illinois, University of Illinois, Urbana-Champaign, Urbana, IL, USA. zmadake2@illinois.edu.
¹⁴ Beckman Institute for Advanced Science and Technology, University of Illinois at Urbana-Champaign, Urbana, IL, USA. zmadake2@illinois.edu.

PMID: 33154507
PMCID: PMC7645727
DOI: 10.1038/s41598-020-76129-8

Abstract

Screening agrochemicals and pharmaceuticals for potential liver toxicity is required for regulatory approval and is an expensive and time-consuming process. The identification and utilization of early exposure gene signatures and robust predictive models in regulatory toxicity testing has the potential to reduce time and costs substantially. In this study, comparative supervised machine learning approaches were applied to the rat liver TG-GATEs dataset to develop feature selection and predictive testing. We identified ten gene biomarkers using three different feature selection methods that predicted liver necrosis with high specificity and selectivity in an independent validation dataset from the Microarray Quality Control (MAQC)-II study. Nine of the ten genes that were selected with the supervised methods are involved in metabolism and detoxification (Car3, Crat, Cyp39a1, Dcd, Lbp, Scly, Slc23a1, and Tkfc) and transcriptional regulation (Ablim3). Several of these genes are also implicated in liver carcinogenesis, including Crat, Car3 and Slc23a1. Our biomarker gene signature provides high statistical accuracy and a manageable number of genes to study as indicators to potentially accelerate toxicity testing based on their ability to induce liver necrosis and, eventually, liver cancer.

PubMed Disclaimer

Conflict of interest statement

There are competing interests between the authors (ZME, RB) and Corteva Agrisciences (NE, KJ); specifically the research was supported by Corteva Agrisciences. Other authors do not declare competing interests.

Figures

**Figure 1**
(A) Structure of ethinyl estradiol (EE). Image obtained from Wikipedia (https://commons.wikimedia.org/wiki/File:Ethinylestradiol.svg). (B) Serum alkaline phosphatase and total bilirubin levels of animals that are exposed to EE. Graphs are generated by Graphpad Prism8 software (GraphPad Software Inc., La Jolla, CA, www.graphpad.com). (C) Total body weight, liver weight and serum triglyceride levels of animals that are exposed to EE. Graphs are generated by Graphpad Prism8 software (GraphPad Software Inc., La Jolla, CA, www.graphpad.com). (D) Hierarchical clustering of hepatic genes regulated by low-, medium- and high-dose EE exposure at selected time points. Cluster3 software (https://bonsai.hgc.jp/~mdehoon/software/cluster/) was used for clustering the differentially expressed genes. Data was visualized using Treeview Java (https://jtreeview.sourceforge.net/).

**Figure 2**
(A) Hierarchical clustering of hepatic genes that are regulated by high-dose EE exposure over 29 days. Cluster3 software (https://bonsai.hgc.jp/~mdehoon/software/cluster/) was used for clustering the differentially expressed genes. Data was visualized using Treeview Java (https://jtreeview.sourceforge.net/). (B) Gene expression patterns of clusters (C1–8) based on average gene expression values that were identified in 2A. Graphs are generated by Graphpad Prism8 software (GraphPad Software Inc., La Jolla, CA, www.graphpad.com). (C) GO terms that are significantly associated with C6. GSEA analysis was performed. Figures are generated using Gene Set Enrichment Analysis software (https://www.gsea-msigdb.org/gsea/index.jsp)^,. (D) PCA analysis of hepatic gene regulation time course dataset for high-dose EE exposure. Figure was generated using StrandNGS (Version 3.1.1, Bangalore, India).

**Figure 3**
(A) Evaluation of average ROC for training (upper panel) and validation (lower panel) with increasing gene number for feature selection. (B) Comparison of ranges of average ROC values for different Nfold (groups) for each feature selection-prediction method combination. Both graphs are generated using Tableau software (Seattle, WA, USA, https://www.tableau.com/).

**Figure 4**
(A) ROC curves for training (upper) and validation (lower) datasets for best performing feature selection-prediction method combinations. (B) List of genes identified by six feature selection methods and their contribution to prediction methods as indicated by mutual info gain for each gene. Color shows details about Rank. The marks are labelled by rank. Both graphs are generated using Tableau software (Seattle, WA, USA, https://www.tableau.com/).

See this image and copyright information in PMC

Cited by

A novel support vector machine-based 1-day, single-dose prediction model of genotoxic hepatocarcinogenicity in rats.
Gi M, Suzuki S, Kanki M, Yokohira M, Tsukamoto T, Fujioka M, Vachiraarunwong A, Qiu G, Guo R, Wanibuchi H. Gi M, et al. Arch Toxicol. 2024 Aug;98(8):2711-2730. doi: 10.1007/s00204-024-03755-w. Epub 2024 May 18. Arch Toxicol. 2024. PMID: 38762666
PFAS and their association with the increased risk of cardiovascular disease in postmenopausal women.
Arredondo Eve A, Tunc E, Mehta D, Yoo JY, Yilmaz HE, Emren SV, Akçay FA, Madak Erdogan Z. Arredondo Eve A, et al. Toxicol Sci. 2024 Aug 1;200(2):312-323. doi: 10.1093/toxsci/kfae065. Toxicol Sci. 2024. PMID: 38758093 Free PMC article.
Artificial Intelligence in Liver Diseases: Recent Advances.
Lu F, Meng Y, Song X, Li X, Liu Z, Gu C, Zheng X, Jing Y, Cai W, Pinyopornpanish K, Mancuso A, Romeiro FG, Méndez-Sánchez N, Qi X. Lu F, et al. Adv Ther. 2024 Mar;41(3):967-990. doi: 10.1007/s12325-024-02781-5. Epub 2024 Jan 29. Adv Ther. 2024. PMID: 38286960 Review.
Progress in toxicogenomics to protect human health.
Meier MJ, Harrill J, Johnson K, Thomas RS, Tong W, Rager JE, Yauk CL. Meier MJ, et al. Nat Rev Genet. 2025 Feb;26(2):105-122. doi: 10.1038/s41576-024-00767-1. Epub 2024 Sep 2. Nat Rev Genet. 2025. PMID: 39223311 Review.
AI-driven Discovery of Morphomolecular Signatures in Toxicology.
Jaume G, Peeters T, Song AH, Pettit R, Williamson DFK, Oldenburg L, Vaidya A, de Brot S, Chen RJ, Thiran JP, Le LP, Gerber G, Mahmood F. Jaume G, et al. bioRxiv [Preprint]. 2024 Jul 23:2024.07.19.604355. doi: 10.1101/2024.07.19.604355. bioRxiv. 2024. PMID: 39091765 Free PMC article. Preprint.

See all "Cited by" articles

References

1. Maggioli J, Hoover A, Weng L. Toxicogenomic analysis methods for predictive toxicology. J. Pharmacol. Toxicol. Methods. 2006;53:31–37. doi: 10.1016/j.vascn.2005.05.006. - DOI - PubMed
1. Laura Suter-Dick FP. Predictive Toxicology. New York: Springer; 2014.
1. Dolinski K, Troyanskaya OG. Implications of Big Data for cell biology. Mol. Biol. Cell. 2015;26:2575–2578. doi: 10.1091/mbc.E13-12-0756. - DOI - PMC - PubMed
1. Längkvist M, Karlsson L, Loutfi A. A review of unsupervised feature learning and deep learning for time-series modeling. Pattern Recogn. Lett. 2014;42:11–24. doi: 10.1016/j.patrec.2014.01.008. - DOI
1. Yang S, Guo L, Shao F, Zhao Y, Chen F. A systematic evaluation of feature selection and classification algorithms using simulated and real miRNA sequencing data. Comput. Math. Methods Med. 2015;2015:11. doi: 10.1155/2015/178572. - DOI - PMC - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions

Grants and funding

T32 ES007326/ES/NIEHS NIH HHS/United States

LinkOut - more resources

Full Text Sources
Medical
- MedlinePlus Health Information
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Identification of early liver toxicity gene biomarkers using comparative supervised machine learning

Affiliations

Identification of early liver toxicity gene biomarkers using comparative supervised machine learning

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Medical

Miscellaneous

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Related information

Grants and funding

LinkOut - more resources

Full Text Sources

Medical

Miscellaneous