This is a preprint.

It has not yet been peer reviewed by a journal.

The National Library of Medicine is running a pilot to include preprints that result from research funded by NIH in PMC and PubMed.

[Preprint]. 2025 Jun 16:2024.01.25.577295.

doi: 10.1101/2024.01.25.577295.

Spyglass: a framework for reproducible and shareable neuroscience research

Kyu Hyun Lee^{1

2

3}, Eric L Denovellis^{1

2

3}, Ryan Ly⁴, Jeremy Magland⁵, Jeff Soules⁵, Alison E Comrie^{1

3}, Daniel P Gramling⁶, Jennifer A Guidera^{1

3

7

8}, Rhino Nevers^{1

3}, Philip Adenekan^{1

3}, Chris Brozdowski^{1

3}, Samuel R Bray^{1

3}, Emily Monroe¹, Ji Hyun Bak¹, Michael E Coulter^{1

3}, Xulu Sun^{1

2

3}, Emrey Broyles^{1

3}, Donghoon Shin^{1

3

7}, Sharon Chiang⁹, Cristofer Holobetz¹⁰, Andrew Tritt⁴, Oliver Rübel⁴, Thinh Nguyen¹¹, Dimitri Yatsenko¹¹, Joshua Chu¹², Caleb Kemere¹², Samuel Garcia¹³, Alessio Buccino¹⁴, Loren M Frank^{1

2

3}

Affiliations

¹ Department of Physiology, University of California, San Francisco.
² Howard Hughes Medical Institute, University of California, San Francisco.
³ Kavli Institute for Fundamental Neuroscience, University of California, San Francisco.
⁴ Scientific Data Division, Lawrence Berkeley National Laboratory.
⁵ Center for Computational Mathematics, Flatiron Institute.
⁶ Graduate Program in Neural and Behavioral Sciences, University of Tübingen.
⁷ UCSF-UC Berkeley Graduate Program in Bioengineering, University of California, San Francisco.
⁸ Medical Scientist Training Program, University of California, San Francisco.
⁹ Department of Neurology, University of California, San Francisco.
¹⁰ Sainsbury Wellcome Centre, University College London.
¹¹ DataJoint.
¹² Department of Electrical and Computer Engineering, Rice University.
¹³ Centre de Recherche en Neuroscience de Lyon, CNRS.
¹⁴ Allen Institute for Brain Science.

PMID: 38328074
PMCID: PMC10849637
DOI: 10.1101/2024.01.25.577295

Spyglass: a framework for reproducible and shareable neuroscience research

Kyu Hyun Lee et al. bioRxiv. 2025.

[Preprint]. 2025 Jun 16:2024.01.25.577295.

doi: 10.1101/2024.01.25.577295.

Authors

Affiliations

¹ Department of Physiology, University of California, San Francisco.
² Howard Hughes Medical Institute, University of California, San Francisco.
³ Kavli Institute for Fundamental Neuroscience, University of California, San Francisco.
⁴ Scientific Data Division, Lawrence Berkeley National Laboratory.
⁵ Center for Computational Mathematics, Flatiron Institute.
⁶ Graduate Program in Neural and Behavioral Sciences, University of Tübingen.
⁷ UCSF-UC Berkeley Graduate Program in Bioengineering, University of California, San Francisco.
⁸ Medical Scientist Training Program, University of California, San Francisco.
⁹ Department of Neurology, University of California, San Francisco.
¹⁰ Sainsbury Wellcome Centre, University College London.
¹¹ DataJoint.
¹² Department of Electrical and Computer Engineering, Rice University.
¹³ Centre de Recherche en Neuroscience de Lyon, CNRS.
¹⁴ Allen Institute for Brain Science.

PMID: 38328074
PMCID: PMC10849637
DOI: 10.1101/2024.01.25.577295

Abstract

Scientific progress depends on reliable and reproducible results. Progress can be accelerated when data are shared and re-analyzed to address new questions. Current approaches to storing and analyzing neural data involve bespoke formats and software that make replication and reuse of data difficult. To address these challenges, we created Spyglass, an open-source data management and analysis framework written in Python. Spyglass provides reproducible pipelines for common neuroscience analyses and sharing of raw data, intermediate analyses and final results within and across labs. Spyglass uses the Neurodata Without Borders (NWB) standard and includes pipelines for spectral filtering, spike sorting, pose tracking, and neural decoding. Spyglass can be extended to apply existing and newly developed pipelines to datasets from multiple sources. We demonstrate these features in the context of a cross-laboratory replication by applying advanced state space decoding algorithms to publicly available data. New users can try out Spyglass on a Jupyter Hub hosted by HHMI and 2i2c: https://spyglass.hhmi.2i2c.cloud/.

PubMed Disclaimer

Conflict of interest statement

Declaration of interests The authors declare no competing interests.

Figures

**Figure 1:. Overview of Spyglass.**
The raw data—consisting of information about the animal, the behavioral task, the neurophysiological data, etc.—is converted to the NWB format (yellow box) and ingested into the Spyglass database. The pipelines (dark green box) operate on pointers to specific data objects in the NWB file (tan box). The raw and processed data are then shared with the community by depositing them to public archives like DANDI or shared with collaborators via Kachery. Visualizations of key analysis steps can be shared over the web via Figurl. Code is shared by hosting the codebase for Spyglass and project-specific pipelines on online repositories like GitHub. Finally, the populated database may be shared by exporting it to a Docker container.

**Figure 2:. Analysis pipelines in Spyglass.**
(A) A general structure for a Spyglass pipeline. (B) Example 1: LFP extraction. Note the correspondence to the pipeline structure in (A) as shown by the color scheme. The trace next to the Raw table is raw data sampled at 30 kHz and is represented by a row in the Raw table. This, along with parameters from LFPElectrodeGroup, IntervalList, and FIRFilterParameters tables (red arrow), are defined in a Python dictionary and inserted into LFPSelection table (the code snippet, using the insert1 method, puts the data in the database table). When the populate method is called on the LFP table, the filtering is initiated and the output is inserted into the database. The results (e.g. the trace above LFP table) are stored in NWB format and its object ID within the file is also stored as a row in LFP table, enabling easy retrieval. (C) Example 2: Sharp-wave ripple (SWR) detection. This pipeline is downstream of the LFP extraction pipeline and consists of two steps: (i) further extraction of a frequency band for SWR (LFPBand); and (ii) detection of SWR events in that band (RippleTimes). Note that the output of LFP extraction serves as the input data for the SWR detection pipeline and can thus be thought of as both Compute and Data types. As in (B), for each step, the results are saved in NWB files and the object ID of the analysis result within the NWB file are stored as rows in the corresponding Compute tables. The trace above the RippleTimes table is the SWR-filtered LFP around the time of a single SWR event (pink shade). In each table, columns in bold are the primary keys. Arrows depict dependency structure within the pipeline.

**Figure 3:. Spike sorting pipeline.**
The Spyglass spike sorting pipeline consists of seven components (large gray boxes): preprocess recording (A); detect artifacts to omit from sorting (B); apply spike sorting algorithm (C); curate spike sorting (D), either with quality metrics (E) or manually (F); and merge with other sources of spike sorting for downstream processing (G). Solid arrows describe dependency relationships and dashed arrows indicate that the data is re-inserted upstream for iterative processing. Note the two design motifs (see text): “cyclic iteration” for curation and “merge” for consolidating data streams. Color scheme is the same as Figure 2, except for light purple (cyclic iteration table), orange (merge table), and peach (Parts table of the merge table).

**Figure 4:. Sharing data and visualizations.**
(A) Kachery provides a convenient Python API to share data over a content-addressable cloud storage network. To retrieve data from a collaborator’s Spyglass database, one can make a simple function call (fetch_nwb) that pulls the data from a node in the Kachery Zone to the local machine. (B) Example of a Figurl interactive figure for visualizing and applying curation labels to spike sorting over the web.

**Figure 5:. Applying decoding pipelines to multiple data sets from different labs**
(A) Decoding neural position from rat hippocampal CA1 using a clusterless state space model (UCSF dataset). In the top panel, grey lines represent positions the rat has occupied in the spatial environment. Overlayed lines in color are the track segments used to linearize position for decoding. Filled circles represent reward wells. The second panel from the top shows the posterior probability of the latent neural position over time. The magenta line represents the animal’s actual position. The vertical lines on the right represent the linearized track segments with the colors corresponding to the top panel. The third panel from the top shows the distance of the most likely decoded position from the animal’s actual position and sign indicates the direction relative to the animal’s head position. The fourth panel from the top is the animal’s speed. The final panel is the multiunit firing rate. (B) Decoding from rat hippocampal CA1 using existing spike sorted units (NYU dataset). Conventions are the same as in A. Filled circle in the linearization represents the reward zone rather than the reward well. (C) Decoding analysis of the NYU dataset. The top panel shows the power difference of the multiunit firing rate between the medial septal cooling period and the pre-cooling period in the 5–13 Hz range. The power at 8–10 Hz is attenuated during cooling while the power at 5–8 Hz is enhanced, showing a slowing of the theta rhythm during cooling. The bottom panel shows that the power of the distance between decoded and actual position (decode distance) is mostly reduced throughout the 5–13 Hz range. (D) Cooling decreases the decode distance and speed and this effect may only recover partially after cooling. Bars represent 95% confidence intervals.

See this image and copyright information in PMC

References

1. Lopes G., Bonacchi N., Frazão J., Neto J.P., Atallah B.V., Soares S., Moreira L., Matias S., Itskov P.M., Correia P.A., et al. (2015). Bonsai: an event-based framework for processing and controlling data streams. Front. Neuroinformatics 9. 10.3389/fninf.2015.00007. - DOI - PMC - PubMed
1. Nath T., Mathis A., Chen A.C., Patel A., Bethge M., and Mathis M.W. (2019). Using DeepLabCut for 3D markerless pose estimation across species and behaviors. Nat Protoc 14, 2152–2176. 10.1038/s41596-019-0176-0. - DOI - PubMed
1. Pachitariu M., Steinmetz N., Kadir S., Carandini M., and Harris K. (2016). Kilosort: realtime spike-sorting for extracellular electrophysiology with hundreds of channels. bioRxiv, 061481. 10.1101/061481. - DOI
1. Siegle J.H., López A.C., Patel Y.A., Abramov K., Ohayon S., and Voigts J. (2017). Open Ephys: an open-source, plugin-based platform for multichannel electrophysiology. J. Neural Eng. 14, 045003. 10.1088/1741-2552/aa5eea. - DOI - PubMed
1. Yatsenko D., Walker E.Y., and Tolias A.S. (2018). DataJoint: A Simpler Relational Data Model. ArXiv180711104 Cs. 10.48550/arXiv.1807.11104. - DOI

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

This is a preprint.

Spyglass: a framework for reproducible and shareable neuroscience research

Affiliations

Spyglass: a framework for reproducible and shareable neuroscience research

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

Grants and funding

LinkOut - more resources

Full Text Sources