Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2024 May 5:2024.01.30.577845.
doi: 10.1101/2024.01.30.577845.

A deep-learning strategy to identify cell types across species from high-density extracellular recordings

Affiliations

A deep-learning strategy to identify cell types across species from high-density extracellular recordings

Maxime Beau et al. bioRxiv. .

Update in

Abstract

High-density probes allow electrophysiological recordings from many neurons simultaneously across entire brain circuits but don't reveal cell type. Here, we develop a strategy to identify cell types from extracellular recordings in awake animals, revealing the computational roles of neurons with distinct functional, molecular, and anatomical properties. We combine optogenetic activation and pharmacology using the cerebellum as a testbed to generate a curated ground-truth library of electrophysiological properties for Purkinje cells, molecular layer interneurons, Golgi cells, and mossy fibers. We train a semi-supervised deep-learning classifier that predicts cell types with greater than 95% accuracy based on waveform, discharge statistics, and layer of the recorded neuron. The classifier's predictions agree with expert classification on recordings using different probes, in different laboratories, from functionally distinct cerebellar regions, and across animal species. Our classifier extends the power of modern dynamical systems analyses by revealing the unique contributions of simultaneously-recorded cell types during behavior.

Keywords: Neuropixels; cell-type identification; cerebellar cortex; cerebellum; circuit mapping; machine learning; variational autoencoder.

PubMed Disclaimer

Figures

Figure 1:
Figure 1:. A strategy for cell type identification from extracellular recordings in neural circuits.
The strategy comprises three steps: data acquisition and curation to build a ground-truth cell type library, selection of features from the ground-truth library to train a machine-learning based classifier, and tests of the classifier using additional datasets, including from other species. The first step is to create a ground-truth library of cell types based on optogenetic activation of genetically-defined neurons during electrophysiological recordings in awake mice. Neurons in the ground-truth library must be activated directly, as confirmed by a combination of synaptic blocker pharmacology and electrophysiological criteria, followed by careful data curation. The second step is to identify features in the dataset that can be used to train a semi-supervised deep-learning classifier. The third step is to test the generality of the classifier by asking it to predict cell types in independent datasets of expert-classified recordings from mice and monkeys.
Figure 2:
Figure 2:. Curation of Neuropixels recordings in the mouse cerebellar cortex.
A. Schematic diagram of the canonical cerebellar circuit. B. Traces on the left show example simple spikes (light blue) and complex spikes (black) in a Purkinje cell. Histogram on the right documents a complex-spike-triggered pause in simple spikes. C. Example recordings from many channels of a Neuropixels probe with magenta, blue, black, and green used to highlight a single unit recorded in the molecular layer, a Purkinje cell’s simple spikes, the same Purkinje cell’s complex spikes, and a unit recorded in the granule cell layer. D. Comparison of example histology labeled with DiI and Hoechst to show the excellent agreement of histological determination of layers and the layers predicted by Phyllum from the electrical recordings. Different colors on the Neuropixels schematic show: magenta, molecular layer; blue, Purkinje cell layer; green, granule cell layer; gray, unknown layer; black, outside cerebellar cortex. E. Autocorrelograms plotting a neuron’s firing rate as a function of time from one of its own trigger spikes for two neurons with very few refractory period violations (RPVs). Note that the spike counts in the autocorrelograms have been divided by the width of the bin so that the y-axis is in spikes/s. F. Analysis of quality of isolation as a function of time during a recording session. From top to bottom graphs show the percentage of refractory period violations, the estimated percentage of missed spikes, and spike amplitude. Horizontal dashed lines show thresholds for acceptance. Gray regions show periods that were rejected from analysis. Blue, green, and red symbols indicate spikes that came from intervals that had too many missed spikes, acceptable isolation, and too many refractory period violations. Marginal histograms on the right show the distribution of spike amplitudes to document clipping at the noise level in the blue histogram that would be cause for rejection of a time interval. G. Example recording traces and spatial footprint of a representative recording with a signal-to-noise ratio (SNR) of 9.33, with the waveforms numbered according to their channel. Asterisk (*) denotes the channel with the largest peak-to-trough amplitude, used to compute the SNR. H. Distribution of percentage of refractory period violations across neurons accepted to the ground-truth library. I. Distribution of estimates of percentage of spikes that were missed across neurons accepted to the ground-truth library. J. Distribution of signal-to-noise ratios on the channel with the largest-amplitude waveform across neurons accepted into the ground-truth library.
Figure 3:
Figure 3:. Strategy for ground-truth identification of cell type.
A. Schematic showing the sequential phases in an experiment designed to test for optogenetic activation in the presence of synaptic blockers. B. Examples of the results used to verify the region of synaptic blockade. Examples above versus below the horizontal dashed line were taken as evidence for versus against blockade at that site. From left to right, we assayed the effect of blockade on the response to optogenetic stimulation, the negative afterwave of a putative mossy fiber waveform, and the discharge statistics defined by autocorrelograms and the value of CV2. C. Raster and peri-stimulus time histogram for a neuron that lost its response to optogenetic stimulation with synaptic blockade. Trial numbers on the y-axis align with the cartoon showing the periods in the experiment to the right of D. Black versus orange histograms show responses before versus during synaptic blockade. Blue shading indicates the time of photostimulation. D. Same as C except for a neuron that retained its response to optogenetic stimulation during synaptic blockade. E. Example of how we determined whether the recordings in C and D were within the region of synaptic blockade. The cartoon schematizes a Neuropixels probe, the top histograms on the right show sites that were within the region of blockade because they lost their responses to optogenetic stimulation, and the lower waveforms show mossy fibers that were below the region of blockade because they retained their negative afterwaves. F. Spatial footprint of the neuron in D. Black, orange, and blue traces show the similarity of the waveforms recorded during the baseline period, during synaptic blockade without optogenetic stimulation, and during synaptic blockade with optogenetic stimulation. G. Distribution of neural response latencies to optogenetic stimulation of directly-activated neurons in presence of synaptic blockade.
Figure 4:
Figure 4:. Analysis and mitigation of off-target expression in mouse optogenetic lines.
A. Double stained section of cerebellum in the GlyT2-Cre line showing expression in both Golgi cells in the granule cell layer and molecular layer interneurons. Red arrows point to cells that express Td-Tomato. Blue cells express parvalbumin (PV). MoL, molecular layer; PCL, Purkinje cell layer; GCL, granule cell layer. B. Cartoon of cerebellar circuit and histogram showing density of TdT-positive somata in each of the three layers in a GlyT2-Cre mouse: GoC, Golgi cell; GrC, granule cell; PC, Purkinje cell; MLI, molecular layer interneuron; CF, climbing fiber; MF, mossy fiber. C. Representative recordings from a Neuropixels probe using optogenetics to activate neurons that express opsins in the GlyT2 line. Magenta, blue, and green waveforms on the right show the spatial footprint of neurons in the MoL, PCL, and GCL. Histograms below the voltage traces show that both the MoL and GCL layer neurons were activated by optogenetic stimulation at the time indicated by the blue shading. D. Same as A, but for the Math1-Cre line. E. Table outlines how we used layer information to disambiguate cell types despite some off-target expression in certain Cre-lines.
Figure 5:
Figure 5:. Selection criteria and properties of the ground-truth library of cerebellar cell types.
A. Curation criteria used to decide which neurons to include in the ground truth library, including the numbers that were retained or deleted at each stage of the curation. B. Histogram showing the number of ground-truth units of each cell type normalized for the number of recordings: MLIs, molecular layer interneurons; GoCs, Golgi cells; MFs, mossy fibers; GrCs, granule cells. C. Superimposed waveforms for each cell type in the ground truth library. Abbreviations as in B, plus: PCSS, Purkinje cell simple spikes; PCCS, Purkinje cell complex spikes. The bold trace indicates the neuron that has an example 3D-ACG in Supplementary Figure 5. Waveforms are normalized and flipped to ensure the largest peak is negative (see Methods). D. Same as C but showing autocorrelograms of ground-truth neurons. Note that the spike counts in the autocorrelograms have been divided by the width of the bin so that the y-axis is in spikes/s. E. Failure of traditional measurements of waveform or discharge statistics to differentiate cell types. Each symbol shows Z-scored values of different features from a single neuron; different colors indicate different cell types, per the key in the upper right. Z-scores were computed separately for each feature but across cell types within each feature. Abbreviations as in B.
Figure 6:
Figure 6:. Performance of a deep-learning classifier on cell type identification for the ground-truth library.
A. Method for normalizing effects of mean firing rate on firing statistics through three-dimensional autocorrelograms (3D-ACGs). Left graphs show the consensus ACG for an example neuron without regard for firing rate on top and 3 ACGs for different mean firing rates on the bottom. The heatmap on the right plots 10 rows that show 2D-ACGs as heatmaps for 10 different deciles of mean firing rate. Arrows indicate the row in the 3D-ACG for each 2D-ACG. B. Schematic of autoencoders used in unsupervised learning to reduce the dimensionality of the waveform and 3D-ACG inputs to the classifier. C. Classifier architecture. Note that we ran the classifier with 10 different initializations for each of the 202 ground-truth units, symbolized by the 202 pages in the classifier. D. Histograms showing the predictions of the classifier on 10 repetitions of training starting with different initial conditions to develop an estimate of confidence from the means of the probabilities assigned to each cell type. E. Percentage of units classified as a function of the ratio we chose as a threshold for confidence in the assignment of cell type. Different colors show data for different ground-truth cell types. F. Confusion matrix showing the agreement between the predictions of the classifier on a single left-out testing unit and the ground-truth cell type of that testing unit. The numbers in each cell indicate the percentage of ground-truth cell types on the y-axis for each prediction of the classifier on the x-axis, where confidence was required to be greater than 2. The rightmost column shows the percentage of ground-truth neurons that received a confidence greater than 2. G. Same as F, but for neurons in the ground-truth library regardless of confidence, i.e. confidence threshold = 0.
Figure 7:
Figure 7:. Ground-truth classifier performance on expert-classified datasets from mice and monkeys.
A. Schematic of the ground-truth classifier, repeated from Figure 6C, but now making predictions based on non-ground-truth data from mouse or monkey. The n = 2020 instantiations of the classifier arise from training the classifier 10 times with different initial conditions for each of 202 left-out ground-truth units: 10 × 202 = 2020. B. Probability as a function of cell type for expert-classified neurons from mice, divided according to the cell type assigned the highest probability by the classifier. From left to right, the highest-probability cell type was a Purkinje cell simple spike (PCss), Purkinje cell complex spike (PCcs), molecular layer interneuron (MLI), Golgi cell (GoC), and mossy fiber (MF). Colored versus gray traces represent neurons that exceeded versus failed the confidence threshold of 2. Probability was averaged across runs with 2020 different forms of the classifier (see Methods). C. Same as B, but for expert-classified neurons from monkey floccular complex of the cerebellum. D. Correspondence matrix showing the agreement between the predictions of the classifier on the x-axis and the expert-labeled cell type from unclassified recordings in mice. The numbers in each cell indicate the percentage of expert-classified cell types on the y-axis as a function of the predictions of the classifier on the x-axis. The rightmost column shows the percentage of expert-classified neurons that received a confidence greater than 2 from the classifier. E. Same as D, for expert classified neurons from monkey floccular complex. F. Confusion matrices showing good agreement between the output from the classifier and the ground-truth identification in mice and monkeys of Purkinje cell simple spikes and complex spikes from the presence of a complex-spike-triggered pause in simple spike firing.
Figure 8:
Figure 8:. Multiple forms of evidence for the similarity of waveforms and resting discharge statistics of different cell types across the ground-truth library and the expert-labeled data from mouse and monkey.
A. Comparison of percentage of classified units as a function of confidence threshold for 3 preparations. Faint colored traces show the same curves for the ground-truth library, from Figure 5E. Bold black and gray traces show results for unclassified mouse and monkey data, respectively. B. Congruence of the output from the autoencoders for identically labeled ground-truth versus expert-classified neurons across preparations. Each row corresponds to a single ground-truth identified neuron. Each column corresponds to a single classifier-identified neuron from mouse (left) or monkey (right). Colors at the intersections for each row and column indicate the cosine similarity of the concatenated outputs from the autoencoders for waveform and autocorrelograms, where redder colors indicate greater similarity. C. Waveforms of different cell-types across laboratories and species. In the first row, waveforms are divided according to ground-truth cell type in mice. In the second and third rows, cell types are divided according to classifier predictions of cell type for non-ground-truth neurons recorded in mice and monkeys. D. Same as C, except showing 2D-autocorrelograms. Note that the spike counts in the autocorrelograms have been normalized by the width of the bin so that the y-axis is in spikes/s.
Figure 9.
Figure 9.. Dynamic trajectories of population responses for individual cell types.
A. Temporal evolution of behavioral tasks in the 4 labs. From left to right: licks per second when a reward is cued by a light; eyelid closure when eye blinks have been conditioned by an LED as conditioned stimulus and an air puff as unconditioned stimulus; paw position aligned on the onset of swing phase during locomotion; eye position during pursuit of a step-ramp target motion. B. Average firing rate of different cell types. Different colors indicate normalized responses for the different cell types. Error bands show mean ± sem. C. Heatmaps where each line shows the normalized firing rate of one neuron, divided according to cell type, as a function of time during the four behaviors. We normalized the firing of each neuron so that the standard deviation of rate in the baseline period was 1. D. Two dimensional trajectories of population dynamics for each full population of neurons and for the individual cell types. The trajectories start at the black, filled circle, the gray trajectory shows the full, unlabeled population, and the colored trajectories show results for different cell types. E. Statistical analysis of the differences among the dynamic population trajectories of different cell types. The lefthand histograms show the distance from each cell type’s trajectories to the trajectory derived without cell-type labels. The righthand half-matrices summarize the p-values for comparison of all trajectories with each other. Black squares are not statistically significant. F. Effect of re-labeling different fractions of each cell-type population randomly on the distance between the dynamic population trajectory for each cell type and the trajectory for the unlabeled, full population.

References

    1. Ramón y Cajal S. Histologie Du Système Nerveux de l’homme & Des Vertébrés. (Maloine, Paris, 1909). doi: 10.5962/bhl.title.48637. - DOI
    1. Fishell G. & Heintz N. The Neuron Identity Problem: Form Meets Function. Neuron 80, 602–612 (2013). - PubMed
    1. Masland R. H. Neuronal cell types. Curr. Biol. 14, R497–R500 (2004). - PubMed
    1. Migliore M. & Shepherd G. M. An integrated approach to classifying neuronal phenotypes. Nat. Rev. Neurosci. 6, 810–818 (2005). - PubMed
    1. Zeng H. & Sanes J. R. Neuronal cell-type classification: challenges, opportunities and the path forward. Nat. Rev. Neurosci. 18, 530–546 (2017). - PubMed

Publication types