Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Mar 1;39(3):btad082.
doi: 10.1093/bioinformatics/btad082.

ARAX: a graph-based modular reasoning tool for translational biomedicine

Affiliations

ARAX: a graph-based modular reasoning tool for translational biomedicine

Amy K Glen et al. Bioinformatics. .

Abstract

Motivation: With the rapidly growing volume of knowledge and data in biomedical databases, improved methods for knowledge-graph-based computational reasoning are needed in order to answer translational questions. Previous efforts to solve such challenging computational reasoning problems have contributed tools and approaches, but progress has been hindered by the lack of an expressive analysis workflow language for translational reasoning and by the lack of a reasoning engine-supporting that language-that federates semantically integrated knowledge-bases.

Results: We introduce ARAX, a new reasoning system for translational biomedicine that provides a web browser user interface and an application programming interface (API). ARAX enables users to encode translational biomedical questions and to integrate knowledge across sources to answer the user's query and facilitate exploration of results. For ARAX, we developed new approaches to query planning, knowledge-gathering, reasoning and result ranking and dynamically integrate knowledge providers for answering biomedical questions. To illustrate ARAX's application and utility in specific disease contexts, we present several use-case examples.

Availability and implementation: The source code and technical documentation for building the ARAX server-side software and its built-in knowledge database are freely available online (https://github.com/RTXteam/RTX). We provide a hosted ARAX service with a web browser interface at arax.rtx.ai and a web API endpoint at arax.rtx.ai/api/arax/v1.3/ui/.

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
An example query graph (a) and the set of subgraphs that ‘fulfill’ it (c) from an example knowledge graph (b). Item a depicts a query graph that asks for proteins that interact with the drug acetaminophen. Query graphs are connected graphs that define particular patterns to search for in-graph data. In this example, the acetaminophen query node is considered ‘pinned’, because it is constrained to a specific concept; Biolink-defined (Unni et al., 2022) node categories (Protein) and edge predicates (interacts_with) constrain the rest of the query. While this query graph is quite simple, there is no hard limit to how many nodes/edges query graphs can include (though there may be a practical limit, due to the tendency for query runtime and memory consumption to increase with query graph size), and they may contain cycles and/or be multigraphs. Item (b) depicts a hypothetical KG that might be used to answer this query. This KG contains three different categories of nodes (represented by node shape/color) and two different edge predicates. Item (c) shows the three subgraphs of the KG that fulfill the query graph. A subgraph fulfills the query graph if it fits the pattern of the query graph in terms of its structure and constraints, with acetaminophen connected to a protein via an interacts_with edge. This example is simplified for clarity’s sake, but in reality there are more complexities to what it means to fulfill a query graph. For instance, Biolink node categories and edge predicates are hierarchical in nature, meaning a valid result for the example query graph could, for instance, use the edge predicate physically_interacts_with, because that predicate is a descendant of interacts_with. In addition, one could specify multiple node categories or edge predicates on a single node/edge in their query graph; in that case, nodes/edges with either of those categories/predicates are considered to fulfill that query node/edge
Fig. 2.
Fig. 2.
ARAX answers queries using workflows, in which each step is handled by a particular ARAX module. Left panel: The left half of this figure depicts a typical simple query that might be submitted to ARAX (Step 1) and how ARAX goes about answering that query (Steps 2–6). More specifically, Step 1 shows a visual representation of the query in query graph form, which asks for proteins associated with both Parkinson’s disease and the drug cilnidipine. Step 2 depicts the Expand step, which reaches out to various KPs for answers to the given query and combines their answers into an answer knowledge graph. Step 3 represents the Overlay step, in which the answer KG is overlaid/annotated with contextual quantitative information in ‘virtual’ edges (dashed/dotted lines). Step 4 depicts the Filter step, in which nodes and edges are filtered from the answer KG based on the statistical information overlaid in Step 3. Step 5 represents the Resultify step, which finds all of the subgraphs in the answer KG that fulfill the query graph. Step 6 represents the Rank step, which calculates a score for each result subgraph indicating the overall degree of epistemic support for that answer. Right panel: The right half of this figure depicts ARAX’s architecture, which is centered around five modules, each of which corresponds to a typical workflow step. Notably, ARAX accepts queries in three different forms, all of which are based on the TRAPI data model: (1) a query graph (as shown in Step 1), (2) the ARAXi graph analysis workflow definition language (Section 2.1) or (3) TRAPI operations. The ARAX system translates the two non-ARAXi input forms into an ARAXi workflow that specifies the series of ARAX modules that will be run to answer the query. The five core ARAX modules (Expander, Overlay, Filter, Resultify and Ranker) operate independently and may be combined in various orders depending on the query. When the input is in the form of a query graph, the Query Graph Interpreter layer selects an appropriate series of ARAXi commands to answer the question at hand; this selection of ARAXi commands is done using a template-based system in which templates were manually curated by examining common query graph structures or archetypes. The final answer knowledge graph and results are returned in a TRAPI JSON response either directly to the requesting software program or to the ARAX UI for browser rendering (Section 2.3.1)
Fig. 3.
Fig. 3.
A small portion of ARAX’s meta-graph including a selection of the most commonly queried node categories. The size of each node represents the number of ARAX’s KPs that can answer queries involving that category, the color of each node represents the number of node identifier types ARAX has access to (through its KPs) for that category and the thickness of each edge represents the number of distinct predicates ARAX has access to between two nodes with the given categories
Fig. 4.
Fig. 4.
ARAX Web UI showing the ARAXi workflow (1) used for Use Case 1 (exploring what proteins might be involved in the connection between DAO and bipolar disorder) and the 17 returned results with their corresponding ranking scores (2), as they appear in the Summary tab in the UI. The expected result of ‘NMDARs’ is highlighted in yellow
Fig. 5.
Fig. 5.
A ‘result graph’ for a single result (‘proline’) for the two-hop query for mediators between DAO and bipolar disorder, with supporting publication (PMID: 27622935) and its corresponding relevant sentence shown (highlighted by a red square) for a specific edge ‘Proline—biolink: related_to—bipolar disorder’
Fig. 6.
Fig. 6.
ARAXi workflow (1) for finding the remdesivir mechanism-of-action downstream of RdRp for treating COVID-19 (Use Case 2), along with the top 25 returned results (2). The results of ‘Virus Replication’ and ‘IFNB1’ mentioned in the main text are highlighted in yellow
Fig. 7.
Fig. 7.
Overlaid histograms of the link prediction probability for the ‘treats’ relationship for all drug/disease pairs having a biolink: treats edge between them (in yellow) or non-treats edge between them (in blue)

References

    1. Angles R., Gutierrez C. (2008) The Expressive Power of SPARQL. In: The Semantic Web - ISWC 2008. Springer, pp. 114–129.
    1. Austin C.P. (2018) Translating translation. Nat. Rev. Drug Discov., 17, 455–456. - PMC - PubMed
    1. Birkland A., Yona G. (2006) BIOZON: A system for unification, management and analysis of heterogeneous biological data. BMC Bioinformatics, 7, 70. - PMC - PubMed
    1. Bodenreider O. (2004) The unified medical language system (UMLS): Integrating biomedical terminology. Nucleic Acids Res., 32, D267–70. - PMC - PubMed
    1. Brown S.H. et al. (2004) VA national drug file reference terminology: A cross-institutional content coverage study. Stud. Health Technol. Inform., 107, 477–481. - PubMed

Publication types