Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Sep 11:14:27.
doi: 10.3389/fninf.2020.00027. eCollection 2020.

NWB Query Engines: Tools to Search Data Stored in Neurodata Without Borders Format

Affiliations

NWB Query Engines: Tools to Search Data Stored in Neurodata Without Borders Format

Petr Ježek et al. Front Neuroinform. .

Abstract

The Neurodata Without Borders (abbreviation NWB) format is a current technology for storing neurophysiology data along with the associated metadata. Data stored in the format is organized into separate HDF5 files, each file usually storing the data associated with a single recording session. While the NWB format provides a structured method for storing data, so far there have not been tools which enable searching a collection of NWB files in order to find data of interest for a particular purpose. We describe here three tools to enable searching NWB files. The tools have different features making each of them most useful for a particular task. The first tool, called the NWB Query Engine, is written in Java. It allows searching the complete content of NWB files. It was designed for the first version of NWB (NWB 1) and supports most (but not all) features of the most recent version (NWB 2). For some searches, it is the fastest tool. The second tool, called "search_nwb" is written in Python and also allow searching the complete contents of NWB files. It works with both NWB 1 and NWB 2, as does the third tool. The third tool, called "nwbindexer" enables searching a collection of NWB files using a two-step process. In the first step, a utility is run which creates an SQLite database containing the metadata in a collection of NWB files. This database is then searched in the second step, using another utility. Once the index is built, this two-step processes allows faster searches than are done by the other tools, but does not enable as complete of searches. All three tools use a simple query language which was developed for this project. Software integrating the three tools into a web-interface is provided which enables searching NWB files by submitting a web form.

Keywords: HDF5; Java; NWB format; Python; SQLite; metadata; neurophysiology; search.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Some of the main data layouts the NWB format. (A) Top-level groups are used to broadly organize the data. (B) Some commonly used session-invariant metadata. In (A,B), the leading slash in the location indicates that these are at a fixed location in the HDF5 file (full absolute path is specified). (C) NWB timeseries are used to store data that varies with time. For each timeseries, the components are all stored within a parent group that has a variable name (chosen by the creator of the file) and can be located in different places within the NWB file. (D) The NWB DynamicTable layout stores data that is logically organized as a table in multiple datasets. Multiple values in the same cell (non-scalar values) are supported using pairs of datasets: one to store the values, e.g., tags; and the other to store value indices, e.g., tags_index.
Figure 2
Figure 2
Table storage using the DynamicTable layout. (A) The table to store contains three columns: trial_id (an integer), start_time (a float) and tags (zero or more strings). Trial_id 1 (first row) has three tags, trial_id 2 has no tags, and the other two trials each have one tag. (B) Storage in NWB format using the DynamicTable layout. The tags column is stored using two datasets: tag_index and tags. The tags dataset contains all the tags from all trials concatenated. The trial_id and start_time and tags_index are store in separate datasets with their elements aligned according to each row. (This illustrates aligned datasets). The tags_index array indicates which tags (elements of the tags array) are in each row of tags column of the table. Each element of tags_index is the index just beyond the last value included for that row. (This illustrates the index array).
Figure 3
Figure 3
Compound datatype used in DynamicTable layout. (A) The table to store contains information indicating what part of two different timeseries data layouts correspond to the trial intervals. There is information about two timeseries. (B) The HDF5 compound datatype is used along with an index array to indicate which timeseries segments are associate with each trial.
Figure 4
Figure 4
Formal description of query grammar in BNF form. <query> is made up of one or more subqueries. The ()* construct at the end of <query> indicates zero or more occurrences. <parent> is a path to an HDF5 group or dataset. <child> is the name of an attribute or dataset within the parent. <string> is a string constant enclosed in single or double quotes. <number> is a numeric constant.
Figure 5
Figure 5
Architecture of NWB Query Engine. The NWB Query Engine is composed of three core components: the Query Parser, the File API and the NWB Processor. The query parser translates a user input into an internal tree representation. The NWB Processor uses the tree to perform searches in a NWB file through the HDF5 Connector. The retrieved data are wrapped in a NWBResult object, which is returned by the query engine. The individual blocks communicate via interfaces to facilitate using the software in different environments, for example, using different data storage methods. The Python server enables calling the query engine from Python code. The Web Interface provides a user-friendly web page that allows running queries from a web browser. The Web Container is a part of the web server that processes user requests and communicates with the NWB Query Engine.
Figure 6
Figure 6
SQLite database schema used for nwbindexer. Each arrow indicates a 1:M (one-to-many) relationship from a foreign key to a primary key.
Figure 7
Figure 7
Example NWB file hierarchy stored using SQLite database tables. (A) NWB file contents. (B) Corresponding database tables. Table entries that are empty store a NULL. The node table field “node_type” is “g” for group and “d” for dataset. The value table “type” field is “s” to indicate a scalar string. Details are given in sections 3.1 and 3.2 (Supplementary Materials).
Figure 8
Figure 8
Comparison of performance of all the three tools. The numbers above each colored bar are the average time for the query using that tool. The vertical line passing through the top of each bar shows the range between the minimum and maximum times. The tested queries are: (A) epochs*:(start_time > 200 & stop_time<250 | stop_time>4850), (B) */data: (unit == "unknown"), (C) general/subject: (subject_id == "anm00210863") & epochs/*: (start_time > 500 & start_time < 550 & tags LIKE "%LickEarly%"), (D) units: (id > -1 & location == "CA3" & quality > 0.8), (E) /general:(virus LIKE "%infectionLocation: M2%"). (F) general/optophysiology/*: (excitation_lambda).
Figure 9
Figure 9
NWB Query Engine Web Interface Preview. The web interface provides a Google-like search box. The progress bar informs the user the percentage of searched files. A table with results is displayed piece by piece. A table row contains the name of the file with requested data, the name of the dataset in which data has been found, the value in the dataset, and a link for downloading the file.
Figure 10
Figure 10
Implementation of Web Interface to the NWB Query Engines. The Web Interface is implemented in a common three layer architecture. We used the Spring Framwework. The data layer accesses data via the NWB Query Engine and returns results to the view layer via the service layer. The view layer is implemented in the Wicket framework.
Figure 11
Figure 11
Comparison of tool features. Each tool is best for a particular purpose.

Similar articles

Cited by

References

    1. Chou J., Howison M., Austin B., Wu K., Qiang J., Bethel E. W., et al. (2011). Parallel index and query for large scale data analysis, in 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC) (Seattle, WA: ), 1–11. 10.1145/2063384.2063424 - DOI
    1. Dai K., Hernando J., Billeh Y. N., Gratiy S. L., Planas J., Davison A. P., et al. . (2020). The sonata data format for efficient description of large-scale network models. PLoS Comput. Biol. 16:e1007696. 10.1371/journal.pcbi.1007696 - DOI - PMC - PubMed
    1. Delorme A., Makeig S. (2004). EEGLAB: an open source toolbox for analysis of single-trial EEG dynamics including independent component analysis. J. Neurosci. Methods 134, 9–21. 10.1016/j.jneumeth.2003.10.009 - DOI - PubMed
    1. Folk M., Heber G., Koziol Q., Pourmal E., Robinson D. (2011). An overview of the HDF5 technology suite and its applications, in Proceedings of the EDBT/ICDT 2011 Workshop on Array Databases, AD '11 (New York, NY: ACM; ), 36–47. 10.1145/1966895.1966900 - DOI
    1. Gorgolewski K. J., Auer T., Calhoun V. D., Craddock R. C., Das S., Duff E. P., et al. . (2016). The brain imaging data structure, a format for organizing and describing outputs of neuroimaging experiments. Sci. Data 3:160044. 10.1038/sdata.2016.44 - DOI - PMC - PubMed

LinkOut - more resources