Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Sep 18;44(38):e0381242024.
doi: 10.1523/JNEUROSCI.0381-24.2024.

A Perspective on Neuroscience Data Standardization with Neurodata Without Borders

Affiliations

A Perspective on Neuroscience Data Standardization with Neurodata Without Borders

Andrea Pierré et al. J Neurosci. .

Abstract

Neuroscience research has evolved to generate increasingly large and complex experimental data sets, and advanced data science tools are taking on central roles in neuroscience research. Neurodata Without Borders (NWB), a standard language for neurophysiology data, has recently emerged as a powerful solution for data management, analysis, and sharing. We here discuss our labs' efforts to implement NWB data science pipelines. We describe general principles and specific use cases that illustrate successes, challenges, and non-trivial decisions in software engineering. We hope that our experience can provide guidance for the neuroscience community and help bridge the gap between experimental neuroscience and data science. Key takeaways from this article are that (1) standardization with NWB requires non-trivial design choices; (2) the general practice of standardization in the lab promotes data awareness and literacy, and improves transparency, rigor, and reproducibility in our science; (3) we offer several feature suggestions to ease the extensibility, publishing/sharing, and usability for NWB standard and users of NWB data.

Keywords: big data neuroscience; collaborative science; data science pipelines; data standardization with neurodata without borders.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing financial interests.

Figures

Figure 1.
Figure 1.
Setup of a typical Fleischmann lab experiment and resulting data streams. The left schematic illustrates in vivo head-fixed two-photon calcium imaging of a deep brain area (e.g., piriform cortex) through a GRIN lens. Throughout the paper, we use the following color scheme: green for neural activity, orange for animal behaviors, and purple for external variables (e.g., stimuli). Raw images from the microscope (top) are preprocessed to obtain fluorescence time series for each segmented neuron (top row, right). The animal receives odor stimuli through an odor port during a time window in each trial, marked by a light purple bar in the fluorescence time series plot. Several behaviors are tracked. A high-resolution camera captures facial movement, typically reduced using the application Facemap into principal components of image motion (middle), or through DeepLabCut into pose estimation or keypoints. Peri-nasal flow and wheel sensors, connected through a microcontroller, provide respiration and running speed estimates, respectively.
Figure 2.
Figure 2.
The issue of data standardization. Systems neuroscience data tend to be multi-modal, e.g.,  time series recorded from standalone sensors and extracted from neural imaging and behavioral videos, plus tables of stimulus or other events (left column). These data are usually scattered across different files in various formats. Researchers wanting a unified standard for ease of analysis and data sharing must choose between at least two possible organizational strategies: prioritizing the data lineage (chosen by NWB format; middle column) or prioritizing conceptual categories of data sources (right column). Color scheme: green for neural activity, orange for animal behavior, and purple for external variables (e.g.,  stimulus).
Figure 3.
Figure 3.
Our data pipeline. There are five primary stages in our data pipeline. Raw data acquired during experiments are archived in cold storage, and also fed to a preprocessing stage to be transformed into more directly usable information (e.g.,  fluorescence time series after cell segmentation). This stage uses a range of processing packages that produce multiple files, which are then combined during NWB conversion into a standardized format. Scientific analysis ideally is performed on the standardized data, but in practice may instead use individual files produced during preprocessing, in which case conversion and analysis stages are swapped. Standardized data are published, e.g.,  by uploading to a publicly accessible archive, in parallel with traditional journal publication.
Figure 4.
Figure 4.
Pain point scenarios in the conversion workflow. This figure describes different scenarios adding burden to the research workflow. The red crosses represent a situation that breaks the existing workflow. The electric current symbol represents the location of a pain point. a, Branching from the main experiment, i.e.,  a redesign or update of the experiment, may break the current conversion code to NWB. b, If some metadata are missing at conversion time, it may force the researcher to come back to the experiment, to the original data, or to the conversion code. c, A scenario where existing NWB files need to be updated, e.g.,  when data from additional experiments like histology experiments become available, or when the NWB files have missing/wrong metadata, or if the NWB file has been found to have some data issues which need to be updated. d, A validation issue before publishing the data to DANDI which may force the researcher to update their conversion code to NWB and reprocess their NWB files.
Figure 5.
Figure 5.
Example of a broader community issue resolution timeline. This figure illustrates the time taken to fix a Suite2p-related issue internally (i.e.,  two months), compared to the time it took to fix the issue for the broader community (i.e.,  five months).
Figure 6.
Figure 6.
Pull requests (PRs) for publishing on the extension catalog may take a long time to be accepted. The data were obtained using GitHub API from nwb-extensions/staged-extensions repository, on 2023-07-30. Out of 23 extension requests, about 61% (14/23) have been merged (bars ended with purple vertical sticks) and added to the catalog, while 13% (3/23) are closed without being added to the catalog (bars ended with red crosses). The review times for finished PRs vary, ranging between within a day to less than five months for most of them, with the exception being 1.6 years for the closed request for ndx-tan-lab-mesh-attributes. About 26% of the extension PRs (6/23) are still open, with 3 out of 6 being stale for more than a year. A notable one is ndx-pose for pose estimation extension (PR #31) which has been open for almost a year (Sept. 2022). Note: any closed/merged PR finished within less than 5 days is artificially extended to be 5 days for visibility.
Figure 7.
Figure 7.
Proposed version-controlled checks for NWBInspector when uploading to DANDI Archive. To be published on DANDI Archive, datasets should always be checked and pass the latest version of NWB Inspector (first and second boxes) to maintain compliance with best practices. When existing datasets need to be updated, they may fail the latest version, for example 3 years after publication, to correct metadata (third box on left). The proposed solution is to allow for checking against the last working version for existing datasets, in cases of non-compliance with the latest version. This solution still allows researchers to disseminate updates and corrections, while maintaining transparency for the community in terms of non-compliance. This solution can be allowed a limited number of times, and failures can also be reported to DANDI Archive maintainers.
Figure 8.
Figure 8.
Code snippet comparison showing how to retrieve data from an NWB file using the “raw” PyNWB API (left) compared to using a custom wrapper (right). After a one-time setup, retrieving the data through a custom wrapper reduces the cognitive load for the user.
Figure 9.
Figure 9.
A proposed design layer for the NWB standard to assist with data retrieval and organization. The nature of the current NWB structure is hierarchical and tends to be organized by processing stages; panel (a) shows an example of this structure. Accessing relevant data requires knowledge of where it is located, which may be multiple levels deep, see for example bottom box (d) to access raw fluorescence data with PyNWB. The proposed “decorative layer” allows for more “fluid” interaction with NWB via additional specifications in NWB objects to assist querying, exploration and analysis with more user/lab/community’s control and customization, without breaking the existing hierarchical NWB structure. Panel (b) illustrates examples of adding tags and aliases. Tags can be more specific, multi-faceted, and customized to concepts of recording/analysis that users tend to look for (e.g.,  neural, behavior, stim, and external), as well as higher level details such as processing stages (e.g.,  raw and proc). Aliases and/or pointers allow users to add names for objects that are most frequently accessed, or expected to be so. Taking advantage of this “decorative layer,” users and developers may design a fluid_nwb API to interact with NWB files in a more flexible and less verbose manner, for example with tags in box (c) and aliases in box (d).

Update of

References

    1. Baker C, Mayorquin H, Weigl AS, Tauffer L, Buccino AP, Sharda S, Dichter B (2023) NeuroConv. original-date: 2022-07-19T16:49:38Z.
    1. Barrak A, Eghan EE, Adams B (2021) On the co-evolution of ML pipelines and source code—empirical study of DVC projects. In: 2021 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER), pp 422–433, Honolulu, HI: IEEE.
    1. Boivin B, Adil N, Neufeld S (2021) Inscopix CNMF-E. original-date: 2021-02-09T01:17:56Z.
    1. Braun E, et al. (2022) Comprehensive cell atlas of the first-trimester developing human brain. Pages: 2022.10.24.513487. Section: new results.
    1. Brose K (2016) Global neuroscience. Neuron 92:557–558. 10.1016/j.neuron.2016.10.047 - DOI - PubMed

LinkOut - more resources