Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Nov 5;10(1):19128.
doi: 10.1038/s41598-020-76129-8.

Identification of early liver toxicity gene biomarkers using comparative supervised machine learning

Affiliations

Identification of early liver toxicity gene biomarkers using comparative supervised machine learning

Brandi Patrice Smith et al. Sci Rep. .

Abstract

Screening agrochemicals and pharmaceuticals for potential liver toxicity is required for regulatory approval and is an expensive and time-consuming process. The identification and utilization of early exposure gene signatures and robust predictive models in regulatory toxicity testing has the potential to reduce time and costs substantially. In this study, comparative supervised machine learning approaches were applied to the rat liver TG-GATEs dataset to develop feature selection and predictive testing. We identified ten gene biomarkers using three different feature selection methods that predicted liver necrosis with high specificity and selectivity in an independent validation dataset from the Microarray Quality Control (MAQC)-II study. Nine of the ten genes that were selected with the supervised methods are involved in metabolism and detoxification (Car3, Crat, Cyp39a1, Dcd, Lbp, Scly, Slc23a1, and Tkfc) and transcriptional regulation (Ablim3). Several of these genes are also implicated in liver carcinogenesis, including Crat, Car3 and Slc23a1. Our biomarker gene signature provides high statistical accuracy and a manageable number of genes to study as indicators to potentially accelerate toxicity testing based on their ability to induce liver necrosis and, eventually, liver cancer.

PubMed Disclaimer

Conflict of interest statement

There are competing interests between the authors (ZME, RB) and Corteva Agrisciences (NE, KJ); specifically the research was supported by Corteva Agrisciences. Other authors do not declare competing interests.

Figures

Figure 1
Figure 1
(A) Structure of ethinyl estradiol (EE). Image obtained from Wikipedia (https://commons.wikimedia.org/wiki/File:Ethinylestradiol.svg). (B) Serum alkaline phosphatase and total bilirubin levels of animals that are exposed to EE. Graphs are generated by Graphpad Prism8 software (GraphPad Software Inc., La Jolla, CA, www.graphpad.com). (C) Total body weight, liver weight and serum triglyceride levels of animals that are exposed to EE. Graphs are generated by Graphpad Prism8 software (GraphPad Software Inc., La Jolla, CA, www.graphpad.com). (D) Hierarchical clustering of hepatic genes regulated by low-, medium- and high-dose EE exposure at selected time points. Cluster3 software (https://bonsai.hgc.jp/~mdehoon/software/cluster/) was used for clustering the differentially expressed genes. Data was visualized using Treeview Java (https://jtreeview.sourceforge.net/).
Figure 2
Figure 2
(A) Hierarchical clustering of hepatic genes that are regulated by high-dose EE exposure over 29 days. Cluster3 software (https://bonsai.hgc.jp/~mdehoon/software/cluster/) was used for clustering the differentially expressed genes. Data was visualized using Treeview Java (https://jtreeview.sourceforge.net/). (B) Gene expression patterns of clusters (C1–8) based on average gene expression values that were identified in 2A. Graphs are generated by Graphpad Prism8 software (GraphPad Software Inc., La Jolla, CA, www.graphpad.com). (C) GO terms that are significantly associated with C6. GSEA analysis was performed. Figures are generated using Gene Set Enrichment Analysis software (https://www.gsea-msigdb.org/gsea/index.jsp),. (D) PCA analysis of hepatic gene regulation time course dataset for high-dose EE exposure. Figure was generated using StrandNGS (Version 3.1.1, Bangalore, India).
Figure 3
Figure 3
(A) Evaluation of average ROC for training (upper panel) and validation (lower panel) with increasing gene number for feature selection. (B) Comparison of ranges of average ROC values for different Nfold (groups) for each feature selection-prediction method combination. Both graphs are generated using Tableau software (Seattle, WA, USA, https://www.tableau.com/).
Figure 4
Figure 4
(A) ROC curves for training (upper) and validation (lower) datasets for best performing feature selection-prediction method combinations. (B) List of genes identified by six feature selection methods and their contribution to prediction methods as indicated by mutual info gain for each gene. Color shows details about Rank. The marks are labelled by rank. Both graphs are generated using Tableau software (Seattle, WA, USA, https://www.tableau.com/).

Similar articles

Cited by

References

    1. Maggioli J, Hoover A, Weng L. Toxicogenomic analysis methods for predictive toxicology. J. Pharmacol. Toxicol. Methods. 2006;53:31–37. doi: 10.1016/j.vascn.2005.05.006. - DOI - PubMed
    1. Laura Suter-Dick FP. Predictive Toxicology. New York: Springer; 2014.
    1. Dolinski K, Troyanskaya OG. Implications of Big Data for cell biology. Mol. Biol. Cell. 2015;26:2575–2578. doi: 10.1091/mbc.E13-12-0756. - DOI - PMC - PubMed
    1. Längkvist M, Karlsson L, Loutfi A. A review of unsupervised feature learning and deep learning for time-series modeling. Pattern Recogn. Lett. 2014;42:11–24. doi: 10.1016/j.patrec.2014.01.008. - DOI
    1. Yang S, Guo L, Shao F, Zhao Y, Chen F. A systematic evaluation of feature selection and classification algorithms using simulated and real miRNA sequencing data. Comput. Math. Methods Med. 2015;2015:11. doi: 10.1155/2015/178572. - DOI - PMC - PubMed

Publication types