Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Jan 1:2020:baaa015.
doi: 10.1093/database/baaa015.

Structured reviews for data and knowledge-driven research

Affiliations

Structured reviews for data and knowledge-driven research

Núria Queralt-Rosinach et al. Database (Oxford). .

Abstract

Hypothesis generation is a critical step in research and a cornerstone in the rare disease field. Research is most efficient when those hypotheses are based on the entirety of knowledge known to date. Systematic review articles are commonly used in biomedicine to summarize existing knowledge and contextualize experimental data. But the information contained within review articles is typically only expressed as free-text, which is difficult to use computationally. Researchers struggle to navigate, collect and remix prior knowledge as it is scattered in several silos without seamless integration and access. This lack of a structured information framework hinders research by both experimental and computational scientists. To better organize knowledge and data, we built a structured review article that is specifically focused on NGLY1 Deficiency, an ultra-rare genetic disease first reported in 2012. We represented this structured review as a knowledge graph and then stored this knowledge graph in a Neo4j database to simplify dissemination, querying and visualization of the network. Relative to free-text, this structured review better promotes the principles of findability, accessibility, interoperability and reusability (FAIR). In collaboration with domain experts in NGLY1 Deficiency, we demonstrate how this resource can improve the efficiency and comprehensiveness of hypothesis generation. We also developed a read-write interface that allows domain experts to contribute FAIR structured knowledge to this community resource. In contrast to traditional free-text review articles, this structured review exists as a living knowledge graph that is curated by humans and accessible to computational analyses. Finally, we have generalized this workflow into modular and repurposable components that can be applied to other domain areas. This NGLY1 Deficiency-focused network is publicly available at http://ngly1graph.org/.

Availability and implementation: Database URL: http://ngly1graph.org/. Network data files are at: https://github.com/SuLab/ngly1-graph and source code at: https://github.com/SuLab/bioknowledge-reviewer.

Contact: asu@scripps.edu.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Conceptual overview of structured review articles. This figure represents the distribution of knowledge in databases accessible to the community in terms of domains compiled (X axis) and information structured (Y axis). Gray squares indicate knowledge focus of a database with regards to the domain(s) and information structured.
Figure 2
Figure 2
Library architecture. Architecture of the system based on four components. The edges component contains libraries with functions to collect, normalize and format the information and data resources we want to integrate as individual networks. The graph component contains functions to integrate and create the knowledge graph. The Neo4j component contains the module to import the graph into Neo4j. Finally, the hypothesis-generation component contains the modules to query the graph, structure the resulting semantic paths and extract summaries to analyse connections and the evidence.
Figure 3
Figure 3
Exploration of mechanistic paths between NGLY1 and AQP1 based on the regulatory hypothesis. (A) First query topology for the regulatory hypothesis. We defined a path topology based on gene pathways of length four linking the NGLY1 ortholog in Drosophila (Pngl) with the human AQP1 gene. The bridging nodes and edges were based on transcriptional regulatory relationships in both Drosophila and human, plus orthology relationships between human and fly genes. (B) Mechanistic hypotheses resulted from the first query.
Figure 4
Figure 4
Exploration of the evidence relating candidate regulators of AQP1 to NGLY1 Deficiency phenotypes. (A) Second query topology for the AQP1 regulation-disease phenotypes shared genetic basis hypothesis. (B) Hypotheses resulted from the second query. All edges are of type ‘has phenotype’.

References

    1. Oughtred R., Stark C., Breitkreutz B.-J. et al. (2019) The BioGRID interaction database: 2019 update. Nucleic Acids Res., 47, D529–D541. - PMC - PubMed
    1. (2019) The gene ontology resource: 20 years and still GOing strong. Nucleic Acids Res., 47, D330–D338. - PMC - PubMed
    1. Mungall C.J., McMurry J.A., Köhler S. et al. (2017) The monarch initiative: an integrative data and analytic platform connecting phenotypes to genotypes across species. Nucleic Acids Res., 45, D712–D722. - PMC - PubMed
    1. Jupp S., Malone J., Bolleman J. et al. (2014) The EBI RDF platform: linked open data for the life sciences. Bioinformatics, 30, 1338–1339. - PMC - PubMed
    1. Ratnam J., Zdrazil B., Digles D. et al. (2014) The application of the open pharmacological concepts triple store (open PHACTS) to support drug discovery research. PLOS ONE, 9, e115460. - PMC - PubMed

Publication types

MeSH terms

Substances

Supplementary concepts