Meta-Analysis

. 2020 Nov 10:9:e61834.

doi: 10.7554/eLife.61834.

SpikeInterface, a unified framework for spike sorting

Alessio P Buccino^#^{1

2}, Cole L Hurwitz^#³, Samuel Garcia⁴, Jeremy Magland⁵, Joshua H Siegle⁶, Roger Hurwitz⁷, Matthias H Hennig³

Affiliations

¹ Department of Biosystems Science and Engineering, ETH Zurich, Zürich, Switzerland.
² Centre for Integrative Neuroplasticity (CINPLA), University of Oslo, Oslo, Norway.
³ School of Informatics, University of Edinburgh, Edinburgh, United Kingdom.
⁴ Centre de Recherche en Neuroscience de Lyon, CNRS, Lyon, France.
⁵ Flatiron Institute, New York, United States.
⁶ Allen Institute for Brain Science, Seattle, United States.
⁷ Independent Researcher, Portland, United States.

^# Contributed equally.

PMID: 33170122
PMCID: PMC7704107
DOI: 10.7554/eLife.61834

Meta-Analysis

SpikeInterface, a unified framework for spike sorting

Alessio P Buccino et al. Elife. 2020.

. 2020 Nov 10:9:e61834.

doi: 10.7554/eLife.61834.

Authors

Alessio P Buccino^#^{1

2}, Cole L Hurwitz^#³, Samuel Garcia⁴, Jeremy Magland⁵, Joshua H Siegle⁶, Roger Hurwitz⁷, Matthias H Hennig³

Affiliations

¹ Department of Biosystems Science and Engineering, ETH Zurich, Zürich, Switzerland.
² Centre for Integrative Neuroplasticity (CINPLA), University of Oslo, Oslo, Norway.
³ School of Informatics, University of Edinburgh, Edinburgh, United Kingdom.
⁴ Centre de Recherche en Neuroscience de Lyon, CNRS, Lyon, France.
⁵ Flatiron Institute, New York, United States.
⁶ Allen Institute for Brain Science, Seattle, United States.
⁷ Independent Researcher, Portland, United States.

^# Contributed equally.

PMID: 33170122
PMCID: PMC7704107
DOI: 10.7554/eLife.61834

Abstract

Much development has been directed toward improving the performance and automation of spike sorting. This continuous development, while essential, has contributed to an over-saturation of new, incompatible tools that hinders rigorous benchmarking and complicates reproducible analysis. To address these limitations, we developed SpikeInterface, a Python framework designed to unify preexisting spike sorting technologies into a single codebase and to facilitate straightforward comparison and adoption of different approaches. With a few lines of code, researchers can reproducibly run, compare, and benchmark most modern spike sorting algorithms; pre-process, post-process, and visualize extracellular datasets; validate, curate, and export sorting outputs; and more. In this paper, we provide an overview of SpikeInterface and, with applications to real and simulated datasets, demonstrate how it can be utilized to reduce the burden of manual curation and to more comprehensively benchmark automated spike sorters.

Keywords: extracellular recordings; mouse; neuroscience; open-source software; python; rat; reproducibility; spike sorting.

PubMed Disclaimer

Conflict of interest statement

AB, CH, SG, JM, JS, RH, MH No competing interests declared

Figures

**Figure 1.. Comparison of spike sorters on a real Neuropixels dataset.**
(A) A visualization of the activity on the Neuropixels array (top, color indicates spike rate estimated on each channel evaluated with threshold detection) and of traces from the Neuropixels recording (below). (B) The number of detected units for each of the six spike sorters (HS = HerdingSpikes2, KS = Kilosort2, IC = IronClust, TDC = Tridesclous, SC = SpyKING Circus, HDS = HDSort). (C) The total number of units for which k sorters agree (unit agreement is defined as 50% spike match). (D) The number of units (per sorter) for which k sorters agree; most sorters find many units that other sorters do not.

**Figure 1—figure supplement 2.. Cumulative histogram of agreement scores (above threshold of .5 that defines a match) for the ensemble sorting of the simulated ground-truth dataset.**
This analysis was performed with the six chosen sorters and highlights how over 80% of the matched units had an agreement score greater than 0.8.

**Figure 1—figure supplement 3.. Comparison of spike sorters on a Neuropixels recording.**
This dataset contains spontaneous neural activity from the rat cortex (motor and somatosensory areas) by Marques-Smith et al., 2018a; Marques-Smith et al., 2018b (dataset spe-c1). The dataset is also available at https://gui.dandiarchive.org/#/dandiset/000034/draft. (A) A visualization of the activity on the Neuropixels array (top, color indicates spike rate estimated on each channel evaluated with threshold detection) and of traces from the Neuropixels recording (below). (B) The number of detected units for each of the six spike sorters (HS = HerdingSpikes2, KS = Kilosort2, IC = IronClust, TDC = Tridesclous, SC = SpyKING Circus, HDS = HDSort). (C) The total number of units for which k sorters agree (unit agreement is defined as 50% spike match). (D) The number of units (per sorter) for which k sorters agree; Most sorters find many units that other sorters do not. The analysis notebook for this analysis can be found at https://spikeinterface.github.io/blog/ensemble-sorting-of-a-neuropixels-recording-2/.

**Figure 1—figure supplement 4.. Comparison of spike sorters on a Biocam recording from a mouse retina.**
This retina recording (Hilgen et al., 2017) has 1’024 channels in a square configuration, and a sampling frequency of 23199 Hz. The dataset can be found at https://gui.dandiarchive.org/#/dandiset/000034/draft. Only four spike sorters were capable of processing this data set (HS = HerdingSpikes2, KS = Kilosort2, IC = IronClust, HDS = HDSort). (A) A visualization of the activity on the Biocam array (top, color indicates spike rate estimated on each channel evaluated with threshold detection) and of traces from the recording (below). (B) The number of detected units for each of the four spike sorters. (C) The total number of units for which k sorters agree (unit agreement is defined as 50% spike match). (D) The number of units (per sorter) for which k sorters agree; most sorters find many units that other sorters do not. The analysis notebook for this analysis can be found at https://spikeinterface.github.io/blog/ensemble-sorting-of-a-3brain-biocam-recording-from-a-retina/.

**Figure 2.. Evaluation of spike sorters on a simulated Neuropixels dataset.**
(A) A visualization of the activity on and traces from the simulated Neuropixels recording. (B) The signal-to-noise ratios (SNR) for the ground-truth units. (C) The number of detected units for each of the six spike sorters (HS = HerdingSpikes2, KS = Kilosort2, IC = IronClust, TDC = Tridesclous, SC = SpyKING Circus, HDS = HDSort). (D) The accuracy, precision, and recall of each sorter on the ground-truth units. (E) A breakdown of the detected units for each sorter (precise definitions of each unit type can be found in the SpikeComparison Section of the Methods). The horizontal dashed line indicates the number of ground-truth units (250).

**Figure 2—figure supplement 1.. Evaluation of spike sorters performance metrics.**
(A) Precision versus recall for the ground-truth comparison the simulated dataset. Some sorters seem to favor precision (HerdingSpikes, SpyKING Circus, HDSort), others instead have higher recall (Ironclust) or score well on both measures (Kilosort2). Tridesclous does not show a bias towards precision or recall. (B) Accuracy versus SNR. All the spike sorters (except Kilosort2) show a strong dependence of performance with respect to the SNR of the ground-truth units. Kilosort2, remarkably, is capable of achieving a high accuracy also for low-SNR units.

**Figure 3.. Comparison of spike sorters on a simulated Neuropixels dataset.**
(A) The total number of units for which k sorters agree (unit agreement is defined as 50% spike match). (B) The number of units (per sorter) for which k sorters agree; Most sorters find many units that other sorters do not. (HS = HerdingSpikes2, KS = Kilosort2, IC = IronClust, TDC = Tridesclous, SC = SpyKING Circus, HDS = HDSort) (C) Number of matched ground-truth units (blue) and false positive units (red) found by each sorter on which k sorters agree upon. Most of the false positive units are only found by a single sorter. Number of false positive units found by $k \geq 2$ sorters: HS = 4, KS = 4, IC = 4, SC = 2, TDC = 1, HDS = 2. (D) Signal-to-noise ratio (SNR) of ground-truth unit with respect to the number of k sorters agreement. Results are split by sorter.

**Figure 3—figure supplement 1.. The fractions of predicted false and true positive units from ensembles using different numbers of sorters.**
All possible subsets of two to five of the six sorters were tested by removing corresponding units from the full sorting comparison. Each dot corresponds to one unique combination of sorters. This analysis shows that false positive units are well-identified using pairs of sorters (almost all false positive units are only found by one sorter), indicating that the sorters are biased in different ways. However, the fraction of true positives in the ensemble (at least two sorters agree) can be significantly lower when only pairs of sorters are used. This is explained by the fact that, for this dataset, a fraction of true positive units are only found by one sorter (as expected since the quality of detection and isolation of the units varies among sorters). In contrast, using four or more sorters reliably identifies most true positive units. For two sorters, the most reliable identification of true positives was achieved by combining two of Kilosort2, Ironclust, and HDSort.

**Figure 3—figure supplement 2.. The SNR of all units found by Kilosort2 in the ground-truth data separated into those with and without matches in the ground-truth spike trains.**
Many detected false positive units have an SNR above the mode of the ground-truth SNR, indicating that SNR is not a good measure to separate false and true positives in this case.

**Figure 4.. Comparison between consensus and manually curated outputs.**
(A) Venn diagram showing the agreement between Curator 1 and 2. 174 units are discarded by both curators from the Kilsort2 output. (B) Percent of matched units between the output of each sorter and C1 (red) and C2 (blue). Ironclust has the highest match with both curated datasets. (C) Similar to C, but using the consensus units (units agreed upon by at least two sorters - $k \geq 2$ ). The percent of matching with curated datasets is now above 70% for all sorters, with Kilosort2 having the highest match (KS_c ∩C1 = 84.55%, KS_c∩C2 = 89.55%), slightly higher than Ironclust (IC_c ∩ C1 = 82.63%, IC_c ∩ C2 = 83.83%). (D) Percent of non-consensus units ( $k = 1$ ) matched to curated datasets. The only significant overlap is between Curator one and Kilosort2, with a percent around 18% (KS_nc ∩ C1 = 18.58%, KS_nc ∩ C2 = 24.34%).

**Figure 5.. Overview of SpikeInterface’s Python packages, their different functionalities, and how they can be accessed by our meta-package, spikeinterface.**

**Figure 6.. Sample spike sorting pipeline using SpikeInterface.**
(A) A diagram of a sample spike sorting pipeline. Each processing step is colored to represent the SpikeInterface package in which it is implemented and the dashed, colored arrows demonstrate how the Extractors are used in each processing step. (B) How to use the Python API to build the pipeline shown in (A). (C) How to use the GUI to build the pipeline shown in (A).

**Author response image 1.. Comparison of five individual runs of Kilosort2 on the simulated Neuropixels recording.**
The top shows the proportions of units from each sorting found in k other sortings. Below these units are split according to false and true positive units after comparison to the ground-truth data. While a sizable fraction of false positive units are unique to each run of the sorter, many are identical in all sortings, indicating that variability in multiple sorter outputs cannot be used to reliably separate false and true positive units.

See this image and copyright information in PMC

References

1. Allen Institute for Brain Science 2019. Allen Brain Observatory Neuropixels. Allen Brain Map. 766640955
1. Angotzi GN, Boi F, Lecomte A, Miele E, Malerba M, Zucca S, Casile A, Berdondini L. SiNAPS: an implantable active pixel sensor CMOS-probe for simultaneous large-scale neural recordings. Biosensors and Bioelectronics. 2019;126:355–364. doi: 10.1016/j.bios.2018.10.032. - DOI - PubMed
1. Ballini M, Müller J, Livi P, Chen Y, Frey U, Stettler A, Shadmani A, Viswam V, Jones IL, Jäckel D, Radivojevic M, Lewandowska MK, Gong W, Fiscella M, Bakkum DJ, Heer F, Hierlemann A. A 1024-Channel CMOS microelectrode array with 26,400 electrodes for recording and stimulation of electrogenic cells in vitro. IEEE Journal of Solid-State Circuits. 2014;49:2705–2719. doi: 10.1109/JSSC.2014.2359219. - DOI - PMC - PubMed
1. Barnett AH, Magland JF, Greengard LF. Validation of neural spike sorting algorithms without ground-truth information. Journal of Neuroscience Methods. 2016;264:65–77. doi: 10.1016/j.jneumeth.2016.02.022. - DOI - PubMed
1. Berdondini L, van der Wal PD, Guenat O, de Rooij NF, Koudelka-Hep M, Seitz P, Kaufmann R, Metzler P, Blanc N, Rohr S. High-density electrode array for imaging in vitro electrophysiological activity. Biosensors and Bioelectronics. 2005;21:167–174. doi: 10.1016/j.bios.2004.08.011. - DOI - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

214431/Z/18/Z/WT_/Wellcome Trust/United Kingdom

LinkOut - more resources

Full Text Sources
Other Literature Sources
- Bio-protocol Exchange
- The Lens - Patent Citations Database

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

SpikeInterface, a unified framework for spike sorting

Affiliations

SpikeInterface, a unified framework for spike sorting

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources