Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2025 Jun 16:2024.01.25.577295.
doi: 10.1101/2024.01.25.577295.

Spyglass: a framework for reproducible and shareable neuroscience research

Affiliations

Spyglass: a framework for reproducible and shareable neuroscience research

Kyu Hyun Lee et al. bioRxiv. .

Abstract

Scientific progress depends on reliable and reproducible results. Progress can be accelerated when data are shared and re-analyzed to address new questions. Current approaches to storing and analyzing neural data involve bespoke formats and software that make replication and reuse of data difficult. To address these challenges, we created Spyglass, an open-source data management and analysis framework written in Python. Spyglass provides reproducible pipelines for common neuroscience analyses and sharing of raw data, intermediate analyses and final results within and across labs. Spyglass uses the Neurodata Without Borders (NWB) standard and includes pipelines for spectral filtering, spike sorting, pose tracking, and neural decoding. Spyglass can be extended to apply existing and newly developed pipelines to datasets from multiple sources. We demonstrate these features in the context of a cross-laboratory replication by applying advanced state space decoding algorithms to publicly available data. New users can try out Spyglass on a Jupyter Hub hosted by HHMI and 2i2c: https://spyglass.hhmi.2i2c.cloud/.

PubMed Disclaimer

Conflict of interest statement

Declaration of interests The authors declare no competing interests.

Figures

Figure 1:
Figure 1:. Overview of Spyglass.
The raw data—consisting of information about the animal, the behavioral task, the neurophysiological data, etc.—is converted to the NWB format (yellow box) and ingested into the Spyglass database. The pipelines (dark green box) operate on pointers to specific data objects in the NWB file (tan box). The raw and processed data are then shared with the community by depositing them to public archives like DANDI or shared with collaborators via Kachery. Visualizations of key analysis steps can be shared over the web via Figurl. Code is shared by hosting the codebase for Spyglass and project-specific pipelines on online repositories like GitHub. Finally, the populated database may be shared by exporting it to a Docker container.
Figure 2:
Figure 2:. Analysis pipelines in Spyglass.
(A) A general structure for a Spyglass pipeline. (B) Example 1: LFP extraction. Note the correspondence to the pipeline structure in (A) as shown by the color scheme. The trace next to the Raw table is raw data sampled at 30 kHz and is represented by a row in the Raw table. This, along with parameters from LFPElectrodeGroup, IntervalList, and FIRFilterParameters tables (red arrow), are defined in a Python dictionary and inserted into LFPSelection table (the code snippet, using the insert1 method, puts the data in the database table). When the populate method is called on the LFP table, the filtering is initiated and the output is inserted into the database. The results (e.g. the trace above LFP table) are stored in NWB format and its object ID within the file is also stored as a row in LFP table, enabling easy retrieval. (C) Example 2: Sharp-wave ripple (SWR) detection. This pipeline is downstream of the LFP extraction pipeline and consists of two steps: (i) further extraction of a frequency band for SWR (LFPBand); and (ii) detection of SWR events in that band (RippleTimes). Note that the output of LFP extraction serves as the input data for the SWR detection pipeline and can thus be thought of as both Compute and Data types. As in (B), for each step, the results are saved in NWB files and the object ID of the analysis result within the NWB file are stored as rows in the corresponding Compute tables. The trace above the RippleTimes table is the SWR-filtered LFP around the time of a single SWR event (pink shade). In each table, columns in bold are the primary keys. Arrows depict dependency structure within the pipeline.
Figure 3:
Figure 3:. Spike sorting pipeline.
The Spyglass spike sorting pipeline consists of seven components (large gray boxes): preprocess recording (A); detect artifacts to omit from sorting (B); apply spike sorting algorithm (C); curate spike sorting (D), either with quality metrics (E) or manually (F); and merge with other sources of spike sorting for downstream processing (G). Solid arrows describe dependency relationships and dashed arrows indicate that the data is re-inserted upstream for iterative processing. Note the two design motifs (see text): “cyclic iteration” for curation and “merge” for consolidating data streams. Color scheme is the same as Figure 2, except for light purple (cyclic iteration table), orange (merge table), and peach (Parts table of the merge table).
Figure 4:
Figure 4:. Sharing data and visualizations.
(A) Kachery provides a convenient Python API to share data over a content-addressable cloud storage network. To retrieve data from a collaborator’s Spyglass database, one can make a simple function call (fetch_nwb) that pulls the data from a node in the Kachery Zone to the local machine. (B) Example of a Figurl interactive figure for visualizing and applying curation labels to spike sorting over the web.
Figure 5:
Figure 5:. Applying decoding pipelines to multiple data sets from different labs
(A) Decoding neural position from rat hippocampal CA1 using a clusterless state space model (UCSF dataset). In the top panel, grey lines represent positions the rat has occupied in the spatial environment. Overlayed lines in color are the track segments used to linearize position for decoding. Filled circles represent reward wells. The second panel from the top shows the posterior probability of the latent neural position over time. The magenta line represents the animal’s actual position. The vertical lines on the right represent the linearized track segments with the colors corresponding to the top panel. The third panel from the top shows the distance of the most likely decoded position from the animal’s actual position and sign indicates the direction relative to the animal’s head position. The fourth panel from the top is the animal’s speed. The final panel is the multiunit firing rate. (B) Decoding from rat hippocampal CA1 using existing spike sorted units (NYU dataset). Conventions are the same as in A. Filled circle in the linearization represents the reward zone rather than the reward well. (C) Decoding analysis of the NYU dataset. The top panel shows the power difference of the multiunit firing rate between the medial septal cooling period and the pre-cooling period in the 5–13 Hz range. The power at 8–10 Hz is attenuated during cooling while the power at 5–8 Hz is enhanced, showing a slowing of the theta rhythm during cooling. The bottom panel shows that the power of the distance between decoded and actual position (decode distance) is mostly reduced throughout the 5–13 Hz range. (D) Cooling decreases the decode distance and speed and this effect may only recover partially after cooling. Bars represent 95% confidence intervals.

References

    1. Lopes G., Bonacchi N., Frazão J., Neto J.P., Atallah B.V., Soares S., Moreira L., Matias S., Itskov P.M., Correia P.A., et al. (2015). Bonsai: an event-based framework for processing and controlling data streams. Front. Neuroinformatics 9. 10.3389/fninf.2015.00007. - DOI - PMC - PubMed
    1. Nath T., Mathis A., Chen A.C., Patel A., Bethge M., and Mathis M.W. (2019). Using DeepLabCut for 3D markerless pose estimation across species and behaviors. Nat Protoc 14, 2152–2176. 10.1038/s41596-019-0176-0. - DOI - PubMed
    1. Pachitariu M., Steinmetz N., Kadir S., Carandini M., and Harris K. (2016). Kilosort: realtime spike-sorting for extracellular electrophysiology with hundreds of channels. bioRxiv, 061481. 10.1101/061481. - DOI
    1. Siegle J.H., López A.C., Patel Y.A., Abramov K., Ohayon S., and Voigts J. (2017). Open Ephys: an open-source, plugin-based platform for multichannel electrophysiology. J. Neural Eng. 14, 045003. 10.1088/1741-2552/aa5eea. - DOI - PubMed
    1. Yatsenko D., Walker E.Y., and Tolias A.S. (2018). DataJoint: A Simpler Relational Data Model. ArXiv180711104 Cs. 10.48550/arXiv.1807.11104. - DOI

Publication types