Applying knowledge-driven mechanistic inference to toxicogenomics

Ignacio J Tripodi¹, Tiffany J Callahan², Jessica T Westfall³, Nayland S Meitzer⁴, Robin D Dowell⁵, Lawrence E Hunter⁶

Affiliations

¹ University of Colorado, Computer Science / Interdisciplinary Quantitative Biology, Boulder, CO 80309, USA. Electronic address: ignacio.tripodi@colorado.edu.
² University of Colorado Anschutz Medical Campus, Computational Bioscience, Denver, CO 80045, USA.
³ University of Colorado, Molecular, Cellular and Developmental Biology, Boulder, CO 80309, USA.
⁴ University of Colorado, Chemical Engineering, Denver, CO 80309, USA.
⁵ University of Colorado, Molecular, Cellular and Developmental Biology / Interdisciplinary Quantitative Biology, Boulder, CO 80309, USA.
⁶ University of Colorado Anschutz Medical Campus, Computational Bioscience / Interdisciplinary Quantitative Biology, Denver, CO 80045, USA.

PMID: 32387679
PMCID: PMC7306473
DOI: 10.1016/j.tiv.2020.104877

Applying knowledge-driven mechanistic inference to toxicogenomics

Ignacio J Tripodi et al. Toxicol In Vitro. 2020 Aug.

. 2020 Aug:66:104877.

doi: 10.1016/j.tiv.2020.104877. Epub 2020 May 6.

Authors

Ignacio J Tripodi¹, Tiffany J Callahan², Jessica T Westfall³, Nayland S Meitzer⁴, Robin D Dowell⁵, Lawrence E Hunter⁶

Affiliations

¹ University of Colorado, Computer Science / Interdisciplinary Quantitative Biology, Boulder, CO 80309, USA. Electronic address: ignacio.tripodi@colorado.edu.
² University of Colorado Anschutz Medical Campus, Computational Bioscience, Denver, CO 80045, USA.
³ University of Colorado, Molecular, Cellular and Developmental Biology, Boulder, CO 80309, USA.
⁴ University of Colorado, Chemical Engineering, Denver, CO 80309, USA.
⁵ University of Colorado, Molecular, Cellular and Developmental Biology / Interdisciplinary Quantitative Biology, Boulder, CO 80309, USA.
⁶ University of Colorado Anschutz Medical Campus, Computational Bioscience / Interdisciplinary Quantitative Biology, Denver, CO 80045, USA.

PMID: 32387679
PMCID: PMC7306473
DOI: 10.1016/j.tiv.2020.104877

Abstract

When considering toxic chemicals in the environment, a mechanistic, causal explanation of toxicity may be preferred over a statistical or machine learning-based prediction by itself. Elucidating a mechanism of toxicity is, however, a costly and time-consuming process that requires the participation of specialists from a variety of fields, often relying on animal models. We present an innovative mechanistic inference framework (MechSpy), which can be used as a hypothesis generation aid to narrow the scope of mechanistic toxicology analysis. MechSpy generates hypotheses of the most likely mechanisms of toxicity, by combining a semantically-interconnected knowledge representation of human biology, toxicology and biochemistry with gene expression time series on human tissue. Using vector representations of biological entities, MechSpy seeks enrichment in a manually curated list of high-level mechanisms of toxicity, represented as biochemically- and causally-linked ontology concepts. Besides predicting the canonical mechanism of toxicity for many well-studied compounds, we experimentally validated some of our predictions for other chemicals without an established mechanism of toxicity. This mechanistic inference framework is an advantageous tool for predictive toxicology, and the first of its kind to produce a mechanistic explanation for each prediction. MechSpy can be modified to include additional mechanisms of toxicity, and is generalizable to other types of mechanisms of human biology.

Keywords: Adverse outcome pathways; Artificial intelligence; Computational toxicology; Mechanistic inference; Mechanistic toxicology.

PubMed Disclaimer

Conflict of interest statement

Declaration of Competing Interest One author (RDD) of this publication is a founder and scientific advisor for Arpeggio Biosciences.

Figures

**Figure 1:. Overview of MechSpy’s mechanistic inference process.**
The knowledge graph (a) of semantically-integrated ontologies and databases and the transcriptomics data (d), in dashed purple frames, are our inputs. After adding new edges to the graph by deductively closing it (b), MechSpy uses node2vec to generate dense vector embeddings of each node (c). We also perform differential expression analysis on the transcriptomics data (d), and obtain a list of the top N most significant changes in gene expression (e). Based on these changes, and using the embeddings for genes and all mechanism steps, MechSpy generates an enrichment score for each mechanism (f) and ranks the top-three with the highest scores. Using the original knowledge graph (a) and the significant genes across time (e), it then produces both a narrative (g) and a graphical explanation (h) for each of the three most enriched mechanisms.

**Figure 2:. Illustrative knowledge graph sample.**
Example of how ontology concepts and database sources are interconnected in the KG extended from the PheKnowLator project [20].

Figure 3:. Summary of the mechanistic inference architecture, showing the use of vector embeddings generated using node2vec, and our sequential order penalty scheme to find the score of a particular mechanism.
In this hypothetical example we have 3 experimental time points, and a mechanism composed of 5 causally-linked steps. For every time point, the n most significant gene changes (where 1 ≤ n ≤ 100) are averaged into a single vector. MechSpy then generates a preliminary enrichment vector $e_{_{x, j}}^{^{*}}$ , which consists of the cosine similarity value between that time point’s gene aggregation and each mechanism step, subtracted from 1. The sequential penalty filter (in purple shades, dividing the 5 mechanism steps in 3 bins) gives less weight to mechanism steps that don’t correspond to the time point in question, with an increasing penalty the farther away we are from our corresponding bin. Finally, the weighted enrichment vectors for each time point are combined such that the maximum score for each mechanism step is kept ( ${\vec{e}}_{_{x}}$ ), and the score for this mechanism is the median of these maximum values.

**Figure 4:. Mechanistic narrative generated for a time series of diclofenac sodium exposure.**
Example of a generated mechanistic narrative for a particular transcriptomics time series, generated by MechSpy.

**Figure 5:. Graphical mechanistic explanation example, generated by MechSpy.**
The sequence of events should be followed top to bottom, left to right. Nodes in dark gray (leftmost column) are significant gene changes along the multiple time points (in order), those in light gray (middle) are the intermediate concepts (genes, pathways, etc), and purple rectangular nodes (rightmost column) represent the enriched mechanism steps. Mechanistic explanation for M2 of Diclofenac (400uM) Open TG-Gates [liver] (double circles indicate one or more genes are known to be active in this tissue type)

**Figure 6:. Simulations of baseline precision.**
Comparison of actual precision values (black dots, top) to baseline estimations from random mechanism draws (violin plots, bottom), for different segmentations of the data (see Table 3). For each chemical used in the public datasets with one or more known mechanisms of toxicity, we randomly drew three mechanisms of the eleven curated (without replacement) to simulate the top-three enrichments. The accuracy across all chemicals was then calculated, and the process was repeated 1000 times. The violin plots show the distribution of baseline accuracy scores from those 1000 runs.

**Figure 7:. Experimental validation of MechSpy’s mechanistic prediction of mitochondrial toxicity for chlorpromazine and adapin.**
Fluorescense intensities of the three potential-sensitive MITO-ID dyes for HUH7 hepatocytes after 24 hour exposure to chlorpromazine (a) and adapin (b). The bars corresponding to treated cells marked with an asterisk (*) present a p-value smaller than 0.05 when compared to the untreated cells using a t-test.

See this image and copyright information in PMC

References

1. Luechtefeld Thomas, Marsh Dan, Rowlands Craig, and Hartung Thomas. Machine Learning of Toxicological Big Data Enables Read-Across Structure Activity Relationships (RASAR) Outperforming Animal Test Reproducibility. Toxicological Sciences, 165(1):198–212, September 2018. - PMC - PubMed
1. Mayr Andreas, Klambauer Günter, Unterthiner Thomas, and Hochreiter Sepp. DeepTox: Toxicity Prediction using Deep Learning. Frontiers in Environmental Science, 3, 2016.
1. Pu Limeng, Naderi Misagh, Liu Tairan, Wu Hsiao-Chun, Mukhopadhyay Supratik, and Brylinski Michal. eToxPred: a machine learning-based approach to estimate the toxicity of drug candidates. BMC Pharmacology and Toxicology, 20(1):2, January 2019. - PMC - PubMed
1. Ashburner Michael, Ball Catherine A., Blake Judith A., Botstein David, Butler Heather, Cherry J. Michael, Davis Allan P., Dolinski Kara, Dwight Selina S., Eppig Janan T., Harris Midori A., Hill David P., Issel-Tarver Laurie, Kasarskis Andrew, Lewis Suzanna, Matese John C., Richardson Joel E., Ringwald Martin, Rubin Gerald M., and Sherlock Gavin. Gene Ontology: tool for the unification of biology, May 2000. - PMC - PubMed
1. The Gene Ontology Resource: 20 years and still GOing strong. Nucleic Acids Research, 47(D1):D330–D338, January 2019. - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Applying knowledge-driven mechanistic inference to toxicogenomics

Affiliations

Applying knowledge-driven mechanistic inference to toxicogenomics

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources