Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Dec 8:2:1054578.
doi: 10.3389/fbinf.2022.1054578. eCollection 2022.

Molecular cartooning with knowledge graphs

Affiliations

Molecular cartooning with knowledge graphs

Brook E Santangelo et al. Front Bioinform. .

Abstract

Molecular "cartoons," such as pathway diagrams, provide a visual summary of biomedical research results and hypotheses. Their ubiquitous appearance within the literature indicates their universal application in mechanistic communication. A recent survey of pathway diagrams identified 64,643 pathway figures published between 1995 and 2019 with 1,112,551 mentions of 13,464 unique human genes participating in a wide variety of biological processes. Researchers generally create these diagrams using generic diagram editing software that does not itself embody any biomedical knowledge. Biomedical knowledge graphs (KGs) integrate and represent knowledge in a semantically consistent way, systematically capturing biomedical knowledge similar to that in molecular cartoons. KGs have the potential to provide context and precise details useful in drawing such figures. However, KGs cannot generally be translated directly into figures. They include substantial material irrelevant to the scientific point of a given figure and are often more detailed than is appropriate. How could KGs be used to facilitate the creation of molecular diagrams? Here we present a new approach towards cartoon image creation that utilizes the semantic structure of knowledge graphs to aid the production of molecular diagrams. We introduce a set of "semantic graphical actions" that select and transform the relational information between heterogeneous entities (e.g., genes, proteins, pathways, diseases) in a KG to produce diagram schematics that meet the scientific communication needs of the user. These semantic actions search, select, filter, transform, group, arrange, connect and extract relevant subgraphs from KGs based on meaning in biological terms, e.g., a protein upstream of a target in a pathway. To demonstrate the utility of this approach, we show how semantic graphical actions on KGs could have been used to produce three existing pathway diagrams in diverse biomedical domains: Down Syndrome, COVID-19, and neuroinflammation. Our focus is on recapitulating the semantic content of the figures, not the layout, glyphs, or other aesthetic aspects. Our results suggest that the use of KGs and semantic graphical actions to produce biomedical diagrams will reduce the effort required and improve the quality of this visual form of scientific communication.

Keywords: graph algorithms; knowledge graphs; molecular pathway; scientific communication; user-centered computing; visualization.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

FIGURE 1
FIGURE 1
Figures selected from the literature. (A) From Figure 2 “Schematic representation of insulin signaling with highlighted in red pathways found to promote brain insulin resistance in AD and DS.” in (Dierssen et al., 2020). (B) From Figure 2 “Mechanism of action of various drugs used in COVID-19 and how they inhibit the NF-κB pathway” in (Hariharan et al., 2021). (C) From Figure 3 “Gut–brain axis exacerbates neurological disorders through gut-microbiota-derived molecular patterns.” in (Suganya and Koo, 2020).
FIGURE 2
FIGURE 2
Workflow used to extract nodes from an original example cartoon, where steps requiring user input are indicated with a person icon. During Indexing, (A) specific concepts, in this case mTOR and autophagy, are selected from the original cartoon as user input, (B) the input concepts are mapped to a sets of nodes in the given KG using partial string matching (C) then the user selects 1 node to represent each concept. During Subgraph Construction, (D) all shortest paths are identified between the given pair of nodes, and (E) the user selects a semantic action which (F) ranks the paths and selects the highest ranked path. (G) This pipeline is repeated for all example node pairs to produce the resulting subgraph for visualization.
FIGURE 3
FIGURE 3
Visualization of the process of generating a subgraph for the Figure 1C example. (A) Selection of concepts from the original cartoon to index according to the chosen KG, (B) resulting subgraph after semantic actions and path search through the graph to recapitulate the edges between the original nodes, (C) the same information from (B) though shown with the same artistic portrayal as the original cartoon, with intermediate nodes highlighted.
FIGURE 4
FIGURE 4
Visualization of the subgraph generated for the Figure 1A example. (A) The number of shortest paths found for each pair identified in the original cartoon. (B) The result of the Path-Degree Product based ranking algorithm that prioritized one path for each original pair of nodes.
FIGURE 5
FIGURE 5
Changes in intermediate nodes between given source and target concepts when the Cosine Similarity (A) and Path-Degree Product path ranking algorithm (B) were applied, when Edge Exclusion (C) was applied and when nearest neighbor augmentation (D) of drugs was applied, with those relationships that aligned with the original example Figure 1B circled.
FIGURE 6
FIGURE 6
Comparison of path prioritization algorithms. For the Figure 1A example, the rank of all paths between IRS1 and AKT3 is shown (as depicted in Figure 4A). For the Figure 1C example, the rank of all paths between Toll-like receptor 4 (human) and microglial cell activation is shown (as depicted in Figure 4B). Annotated points identify the highest ranked path for Cosine Similarity (circles) and Path-Degree Product (triangles) to highlight the differences in ranking. The Figure 1B example is not included as all pairs had only between 1 and 2 shortest paths.
FIGURE 7
FIGURE 7
Network properties of subgraphs generated by each path ranking algorithm. (A) Path length for all pairs existing in each example figure (B) and total number of nodes that exist in the subgraph for each example figure.
FIGURE 8
FIGURE 8
Semantic properties of subgraphs generated by each path ranking algorithm. (A) The number of unique edge types within each subgraph generated (B) the number of unique ontologies that make up each subgraph generated.

References

    1. Ashburner M., Ball C. A., Blake J. A., Botstein D., Butler H., Michael Cherry J., et al. (2000). Gene ontology: Tool for the unification of biology. Nat. Genet. 25 (1), 25–29. 10.1038/75556 - DOI - PMC - PubMed
    1. Avram S., Bologa C. G., Holmes J., Bocci G., Wilson T. B., Nguyen D. T., et al. (2021). DrugCentral 2021 supports drug discovery and repositioning. Nucleic Acids Res. 49 (D1), D1160–D1169. 10.1093/nar/gkaa997 - DOI - PMC - PubMed
    1. Bachman J. A., Gyori B. M., Sorger P. K. (2018). FamPlex: A resource for entity recognition and relationship resolution of human protein families and complexes in biomedical text mining. BMC Bioinforma. 19 (1), 248. 10.1186/s12859-018-2211-5 - DOI - PMC - PubMed
    1. BioRender (2022). BioRender. AvaliableAt: https://biorender.com/ .
    1. Bordi M., Darji S., Sato Y., Mellén M., Berg M. J., Kumar A., et al. (2019). mTOR hyperactivation in Down syndrome underlies deficits in autophagy induction, autophagosome formation, and mitophagy. Cell. Death Dis. 10 (8), 563. 10.1038/s41419-019-1752-5 - DOI - PMC - PubMed

LinkOut - more resources