Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Nov:2019:1742-1749.
doi: 10.1109/bibm47256.2019.8982986. Epub 2020 Feb 6.

A linked data graph approach to integration of immunological data

Affiliations

A linked data graph approach to integration of immunological data

Syed Ahmad Chan Bukhari et al. Proceedings (IEEE Int Conf Bioinformatics Biomed). 2019 Nov.

Abstract

Systems biology involves the integration of multiple data types (across different data sources) to offer a more complete picture of the biological system being studied. While many existing biological databases are implemented using the traditional SQL (Structured Query Language) database technology, NoSQL database technologies have been explored as a more relationship-based, flexible and scalable method of data integration. In this paper, we describe how to use the Neo4J graph database to integrate a variety of types of data sets in the context of systems vaccinology. Specifically, we have converted into a common graph model diverse types of vaccine response measurement data from the NIH/NIAID ImmPort data repository, pathway data from Reactome, influenza virus strains from WHO, and taxonomic data from NCBI Taxon. While Neo4J provides a graph-based query language (Cypher) for data retrieval, we develop a web-based dashboard for users to easily browse and visualize data without the need to learn Cypher. In addition, we have prototyped a natural language query interface for users to interact with our system. In conclusion, we demonstrate the feasibility of using a graph-based database for storing and querying immunological data with complex biological relationships. Querying a graph database through such relationships has the potential to reveal novel relationships among heterogeneous biological data.

Keywords: graph database; immunology; influenza vaccine; knowledgebase; ontology.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Generation of profiling data from young and older subjects before (day 0) and following influenza vaccination.
Figure 2.
Figure 2.
LinkedImm is created based on integration of diverse types of data from multiple sources.
Figure 3.
Figure 3.
The graph model of HAI and gene expression measurements for a single study (SDY404). Study subjects (yellow circles) are annotated by their demographic information, including age cohort (e.g., young, older, etc.). Each subject is linked to blood samples that were collected (red circles), which are in turn linked to the different experiments that were run on these samples (green circles). HAI measurements (grey circles) are collected for each of the virus strains included in the vaccine given to that subject. Gene expression measurement are available for all of the transcripts measured (green circles, with only a subset shown for clarity).
Figure 4.
Figure 4.
The graph model for HAI result linked to virus taxonomy as part of NCBI TAXON.
Figure 5.
Figure 5.
A Cypher query (top line) is used to retrieve all the subjects (red circles) profiled in a specified study (SDY404, blue circle).
Figure 6.
Figure 6.
A more expressive Cypher query involving inference based on the hierarchical relationship recognizing the virus strains of the H3N2 subtype.
Figure 7.
Figure 7.
A plot view of the age distributions of subjects for different HIPC studies.
Figure 8.
Figure 8.
A box plot of the distributions of antibody titer values for different H1N1 virus strains measured across different studies.
Figure 9.
Figure 9.
A natural language query example (plot female subjects over 65 years old).

References

    1. Yoon BH, Kim SK, and Kim SY, Use of Graph Database for the Integration of Heterogeneous Biological Data. Genomics Inform, 2017. 15(1): p. 19–27. - PMC - PubMed
    1. Joshi-Tope G, et al., Reactome: a knowledgebase of biological pathways. Nucleic Acids Res, 2005. 33(Database issue): p. D428–32. - PMC - PubMed
    1. Fabregat A, et al., Reactome graph database: Efficient access to complex pathway data. PLoS Comput Biol, 2018. 14(1): p. e1005968. - PMC - PubMed
    1. Balaur I, et al., Recon2Neo4j: applying graph database technologies for managing comprehensive genome-scale networks. Bioinformatics, 2017. 33(7): p. 1096–1098. - PMC - PubMed
    1. Summer G, et al., cyNeo4j: connecting Neo4j and Cytoscape. Bioinformatics, 2015. 31(23): p. 3868–9. - PMC - PubMed

LinkOut - more resources