Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Sep 15;185(19):3568-3587.e27.
doi: 10.1016/j.cell.2022.08.019.

Cell-type-specific population dynamics of diverse reward computations

Affiliations

Cell-type-specific population dynamics of diverse reward computations

Emily L Sylwestrak et al. Cell. .

Abstract

Computational analysis of cellular activity has developed largely independently of modern transcriptomic cell typology, but integrating these approaches may be essential for full insight into cellular-level mechanisms underlying brain function and dysfunction. Applying this approach to the habenula (a structure with diverse, intermingled molecular, anatomical, and computational features), we identified encoding of reward-predictive cues and reward outcomes in distinct genetically defined neural populations, including TH+ cells and Tac1+ cells. Data from genetically targeted recordings were used to train an optimized nonlinear dynamical systems model and revealed activity dynamics consistent with a line attractor. High-density, cell-type-specific electrophysiological recordings and optogenetic perturbation provided supporting evidence for this model. Reverse-engineering predicted how Tac1+ cells might integrate reward history, which was complemented by in vivo experimentation. This integrated approach describes a process by which data-driven computational models of population activity can generate and frame actionable hypotheses for cell-type-specific investigation in biological systems.

Keywords: attractor dynamics; cell type; dynamical systems; habenula; motivation; reinforcement; reward.

PubMed Disclaimer

Conflict of interest statement

Declaration of interests These tools, and all protocols, clones, and sequences, are freely available to nonprofit institutions and investigators. D.S. is employed as a research scientist by Meta (Meta Reality Labs); his work there is unrelated to this study. K.V.S. is a consultant to Neuralink Corp. and CTRL-Labs Inc. in the Reality Labs Division of Meta (formerly Facebook); he is also on the Scientific Advisory Boards of Inscopix Inc., Mind X Inc., and Heal Inc. S.V. is a consultant to Compass Therapeutics. X.W., W.E.A., and K.D. hold IP for the hydrogel-tissue chemistry (STARmap) methods; K.D. is also a member of the Cell advisory board and cofounded and advises Maplight Therapeutics. These entities did not support or influence this work.

Figures

Figure 1.
Figure 1.. Molecular and anatomical characterization of medial habenula cell types.
(A) Experimental design for 3 rounds of STARmap in situ sequencing of 15 genes in the habenula (Hb). Barcoded probes hybridize to mRNA targets and undergo rolling circle amplification. Sequential hybridization decodes each base on two adjacent rounds (STAR Methods). (B) Deconvolved image from one round of in situ sequencing of Hb tissue. Dashed line indicates MHb boundaries. Box indicates ROI in (C). Scalebar: 100 μm. (C) Top, magnified view of dotted box in (B) across 3 rounds of imaging. Scale bar: 10 μm. Bottom, magnified view indicated by arrow in top panels. Scale bar: 1 μm. (D) Uniform manifold approximation projection (UMAP) of the expression of 15 genes for 1440 segmented Hb neurons from 2 biological replicates. Grayscale indicates the Z scored expression of Tac1 and Chat. (E) Heatmap of expression levels of each gene (row) for each cell (column), color bar indicates Z score for each gene across all clusters. (F) UMAP projection of all neurons. Color indicates cluster identity. (G) Clusters identified in (F) are mapped onto the position of each cell in the Hb for two biological replicates. Scale bar: 100 μm. (H) Quadruple in situ hybridization of Tyrosine Hydroxylase (Th), Tachykinin1 (Tac1), Choline Acetyltransferase (Chat), and Calbindin1 (Calb1) mRNA. Scale bar: 100 μm. (I) Quantification of overlap in (H). Grayscale indicates the proportion of cells expressing Gene 1 that also express Gene 2. Fractional overlap listed inside each box. n = 3639 neurons. (J) Left, coronal sections from mouse atlas showing the axonal projections from the medial Hb to the interpeduncular nucleus (IPN) (Konsman, 2001). Top right, neurons expressing AAV1-DIO-EYFP in X-Cre animals in the Hb, with α-GFP immunostaining. Bottom right, 3D rendering of YFP+ IPN axons of X-Cre:DIOYFP animals after tissue clearing (~3-mm-thick sections, pseudocolored for YFP intensity). Scale bars: 100 μm. See also Figure S1.
Figure 2.
Figure 2.. Habenular cell types show distinct reward-related activity.
(A) Trial structure of 3-Choice Serial Reaction Time Task. After a variable delay, a cue light appears in one of three nose pokes for 1 s. Nose pokes into the lit port result in delivery of sucrose water at the reward port on the opposite wall. Premature and incorrect trials result in a 5 s time out. (B) Training consists of 6 stages of progressively shorter cue durations and the introduction of a variable delay. (C) Behavioral performance across training. Percentage of correct (green), incorrect (red), omitted (black), and premature responses (blue) for all animals (n = 29 mice). (D) Example traces from photometry recording during behavior, Z scored across each session. (E) Example photometry recordings from one behavioral session per genotype. Each row represents a single trial where t = 0 is cue onset. Color: Z scored fluorescence. Data from premature and omitted trials are not displayed. (F–H) Left panels, photometry time series normalized to 405 nm control, Z scored across each session, and aligned to cue onset, nose poke, or reward port entry. Data is separated by behavioral outcome: correct (green), incorrect (red), omitted (black), and premature (blue). Error bars indicate SEM. Right panels, %ΔF/F calculated before and after the cue (F) or nose poke (G). Two-way ANOVA with repeated measures correction. Cue effect: Th+, p < 0.05. Nose poke effect: Tac1+, p <0 .01. Trial outcome vs cue: Tac1+, p < 0.05. Trial outcome versus nose poke: Th+, p <0 .01; Tac1+, p <0 .001; ChAT+, p < 0.05. See Table S1 for multiple comparisons. For reward (H), Th+, p <0 .001, Tac1+, p <0 .01, ChAT+, p <0 .01 by paired t test. (I) Summary of mean Z scores for each genotype in (H), aligned to reward port entry. (J and K) Quantification of the change in GCaMP fluorescence at reward approach or reward consumption with a one-sample t test with FDR correction. For reward approach: Th+, p < 0.05 and Tac1+, p < 0.05. For reward consumption, Th+, p < 0.01, Tac1+, p < 0.05, ChAT+, p < 0.05. In all panels, black bars indicate the time periods analyzed before (open) and after (filled) behavioral events. Data represents average across mice: Th-Cre, n = 5 mice; Tac1-Cre, n = 7 mice, ChAT-Cre, n = 5 mice, Calb1-Cre, n = 5 animals; LHb, n = 7 animals. Error bars: SEM. See also Figure S2 and Table S1.
Figure 3.
Figure 3.. Cell-type-specific segregation of cue-related and outcome-related reward activity.
(A) A variant of the task where reward-predicting cues and reward probabilities were modified. In 70% of correct trials, rewards were cued and delivered (green). In the remaining correct trials, the reward was not cued (light blue), the reward was not delivered (black, 10%), or the reward was neither cued nor delivered (navy, 10%). (B) Latency to retrieve reward on correct trials, and the duration in the reward port consuming the reward. Two-way ANOVA with repeated measures, corrected for FRD. For reward latency: cue effect, p < 0.01, reward effect, p < 0.01, Interaction, p < 0.01. For reward delivery: reward effect, p < 0.0001. n = 25 animals. Error bars: SEM. (C–G) Reward-related activity in Hb cell types to predictive cues and reward delivery. Left panels, mean Z scored photometry data aligned to reward port entry. Black and gray bars indicate pre- (open bar) and post- (closed bar) comparison. Color indicates trial type in (A). Right panels, %ΔF/F at reward approach and consumption (C) or consumption (D–G). Gray lines represent individual animals, bars indicate the mean for each reward contingency: TH-Cre, n = 4 mice; Tac1-Cre, n = 7 mice (MHb-Targeted, 82% of neurons in the MHb); ChAT-Cre, n = 5 mice; Calb1-Cre, n = 5 animals; LHb, n = 4 animals. One-way ANOVA with repeated measures and FDR correction, *p < 0.05. (H) Serotype tropism for LHb neurons. The ratio of GCaMP+ LHb neurons to all GCaMP+ neurons (see STAR Methods) was calculated for AAV1 and AAV8 injections. AAV1, n = 10 animals; AAV8, n = 5 animals. For photometry experiments in Figures 2 and 3, animals with confirmed fiber placement and >70% MHb neurons were included in the analysis in order to assess the activity of MHb Tac1 neurons (the majority of Tac1 neurons in the Hb). Animals excluded from analysis are indicated by open circles. (I) Spatial distribution of LHb-targeted Tac1+ neurons. All GCaMP+ neurons in the LHb were counted and registered to a common coordinate system. Contour lines: the deciles of normalized cell density. (J) Tac1+ neurons in the LHb were targeted for fiber photometry recording using Tac1-Cre mice (88% of neurons in the LHb) and lateral injection of AAV8-DIO-GCaMP6f. Example from one animal in a session where 20% of rewards were withheld, showing all correct trials aligned to the reward port entry. White dots: withheld trials. (K) Mean Z score for LHb-targeted Tac1+ neurons at reward port entry for rewarded (green) and withheld (black) trials. n = 7 animals. See also Figure S3.
Figure 4.
Figure 4.. Long-timescale activity dynamics and behavioral significance of habenular cell types.
(A) Top, example traces of photometry signal for a Tac1-Cre mouse for the first 5 rewards and last 5 rewards of a behavioral session. Dotted line indicates reward delivery. Bottom, reward responses for correct trials in an example session from one mouse, sorted by nth correct trial. (B) Average reward response for each genotype, Z scored across each session and sorted by nth correct trial. Only rewarded trials are displayed. (C) Quantification of fluorescence changes over a behavioral session 1 s after cue onset and 4 s after head entry into the reward port. To look at changes across the session, the mean Z score of the first 5 correct trials were compared to 5 late trials (correct trials #36–40). *p < 0.05 by paired t test. Error bars indicate SEM All p values are FDR-corrected. Tac1Hb animals include AAV1 injections with fiber placements in the MHb (82% of GCaMP+ neurons in the MHb). Tac1LHb animals include AAV8 injections with fiber placements in the LHb (88% GCaMP+ neurons in the MHb). Gray lines: individual animals. (D) Bias for AAV1-YFP, AAV5-YFP, and AAV8-YFP to infect MHb neurons. Injection locations at MHb/LHb boundary were verified with simultaneous fluorosphere injection. Each dot represents one animal. (E) Intersectional strategy to target MHb Tac1+ neurons for optogenetic silencing. AAV1 CreOnFlpOff eNpHR3.0 injected in the MHb to turn on expression of eNpHR3.0 in all Tac1 Hb neurons. AAV8-Flp preferentially infects LHb neurons to turn off eNpHR3.0 expression laterally. (F) Distribution of virally infected neurons in AAV1 CreONFlpOff alone versus AAV1 CreONFlpOff plus AAV8-Flp. Contour lines represent the deciles of normalized cell density. n = 5 animals/condition. (G) Behavioral paradigm for head-fixed reward-guided decision-making task. Animals are presented with two lick spouts. In a block trial structure, one spout hasa high probability of reward (0.9) and the other a low probability of reward (0.1). After a pre-cue period in which premature licks terminate the trial, a 1 s cue light is illuminated. During this time, licks result in delivery of a water reward according to the reward contingencies defined for that block. Optogenetic inhibition is restricted to rewarded trials on the high probability lick spout. (H) Fraction of left side choices on right-to-left block switches (blue) and left-to-right block switches (green). Error bars indicate SEM. Dotted box: trials quantified in (J). (I) Schematic of stimulation paradigm at block switches and prediction of the behavioral response. On a subset of block switches, rewarded licks on trials 0–15 after the block switch trigger 2 s of 594 nm light to activate eNpHR3.0. (J) Mean fraction of high probability port choices for each animal across trials 10–15 after the block switch for control and light inhibited trials (indicated by dotted box in [H]). n = 5 animals. (K) Time spent in yellow light stimulated side of a two-chamber real time place preference assay. n = 6 animals. See also Figure S4.
Figure 5.
Figure 5.. Cell-type-specific imaging and electrophysiology reveals distinct long-timescale dynamics at single-neuron and population level.
(A) A simplified reward task for head-fixed calcium imaging and electrophysiology. During a 0.5 s prestimulus period, animals must withhold licking. During a 1 s cue period, licks are rewarded with sucrose water delivery. In 15% of trials, earned rewards are withheld. Trials are separated by a variable ITI. (B) Example 2p image of Tac1MHb neurons expressing H2B-GCaMP6f, imaged through a 600 μm GRIN lens. Right, example traces from Tac1MHb neurons. (C–D) Heatmap showing (left) Z scored, trial-averaged activity from single Tac1MHb (C) or TH+ (D) neurons on rewarded trials. Neurons from 2 mice are included in each panel. (right) Heat maps from one example animal showing population activity (sum of fluorescence across all neurons) for each trial in one session (right). (E and G) Activity of reward-responsive neurons during the reward period over a behavioral session in Tac1MHb and TH+ neurons, respectively. Data normalized to the first trial shown. Mean (bold line) and individual mice (gray lines). Across the session, Tac1MHb populations showed stereotyped and statistically significant increasing population activity (p = 1.7×10−22) and TH+ populations showed more variable and slightly decreasing population activity (p = 3.6×10−5); p values for the null hypothesis that the slope of linear regression is zero. (F and H) The proportion of reward-responsive cells over the session in Tac1MHb and TH+ mice, respectively. A responding cell is defined as a cell with a Z score greater than 0.25 for the 1 s reward period. Across the session, Tac1MHb neurons show a statistically significant increase (p = 6.4×10−15) and TH+ neurons show a statistically significant decrease (p = 5.8×10−4) in the fraction of active neurons. (I) Experimental configuration for Neuropixels 2.0 recording. A 4-shank probe was approached at 10 from the midline. A 637 nm laser was illuminated above the skull. (J) Summary of Neuropixels probe insertions targeting MHb. n = 7 animals, 18 behavioral sessions. (K) Spatial position of recorded single neurons registered to the Allen Brain Atlas. Red, MHb; green, LHb; blue, others; *, optotagged. See Figure S5C for more complete visualization. n = 6099 Hb neurons, including 29 optotagged Tac1MHb and 25 optotagged Tac1LHb. (L) Spike raster plot for an example optotagged Tac1MHb neuron at 2, 5, 10 ms pulsewidth. (M) Spike raster plot and firing rate across trials, for three example optotagged Tac1MHb neurons. (N) Left, population-averaged baseline firing rate across trials for Tac1MHb neurons (p = 4.6 × 10−16); right, baseline firing rate across trials for Tac1LHb neurons (p = 0.10). (O) Fraction of neurons in each brain area showing significant ramping up or ramping down across a behavioral session. See STAR Methods for statistical criteria for classifying ramping characteristics. See also Figures S4 and S5.
Figure 6.
Figure 6.. Data-driven modeling of cellular-resolution population activity identifies cell-type-specific attractor dynamics.
(A) LFADS modeling of neural population activity. The dynamics of Tac1MHb or TH+ neurons measured by two-photon microscopy can be described as trajectories in the neural state space (left). These trajectories can be generated by an RNN, which approximates the underlying neural dynamical system, at the single-trial level (right). (B) The trial-averaged neural trajectory of Tac1MHb (left) and TH+ (right) neurons in the raw data and the RNN, demonstrating consistent epoch-dependent dynamics demarcated by the dots. (C) Single-trial trajectories and the underlying attractor manifolds identified by fixed point analysis of the trLFADS generator RNN. In the Tac1MHb neurons (left), the line attractor integrates the external inputs over time, resulting in the trajectories shift along the line attractor as the session progress. Note the alignment of the line attractor and the total activity mode for Tac1MHb neurons, implying the progressive increase of total activity. The straight colored line is the top principal component of the identified fixed points. In the TH+ neurons (right), the discrete point attractor confines the trajectories, resulting in no change in total activity over time. The orange dots represent the cue onsets. See also Figure S6.
Figure 7.
Figure 7.. Transient optogenetic perturbation and reward history modulation experiments support the line attractor dynamics model.
(A) Schematic representation of transient optogenetic perturbation of the line attractor dynamics. (B) Intersectional gene targeting approach. AAV1 CreOnFlpOff ChRmine-p2A-oScarlet is delivered to MHb and LHb neurons of Tac1-Cre mice. AAV8-Flp is delivered to the LHb to turn off ChRmine expression in Tac1LHb neurons. (C) Constructs used for INTRSECT implementation as described in (B). (D) Trial structure for the perturbation experiment. (E) Experimental configuration for transcranial optogenetic stimulation and neural recording using Neuropixels 2.0 probes. (F) Spike raster plots and firing rate changes for example validated optotagged MHb Tac1 neurons (left) or nearby modulated MHb neurons (right), which were all simultaneously recorded. (G) Time to baseline recovery after perturbation for MHb neurons. (H) Average firing rate changes in rewarded (green), unrewarded (black), and perturbation (red) trials. Curves: mean; error bar: SEM from hierarchical bootstrap. n = 1,078 neurons, 4 sessions, 2 mice. (I) Within-trial and across-trial firing rate changes for optogenetic perturbation of MHb Tac1 neurons. Trials were split into early/late halves for concise visualization. (J) Simulation of reward signal accumulation with varying reward probability (1.0, 0.8, and 0.5). Top, example state space trajectories show generated single sessions, initialized at the identical initial state. Bottom, model predictions on changes in total population activity (fiber photometry signal) across rewarded trials. For each case, 1,000 simulated sessions with random initial states were averaged. (K) Fiber photometry recordings in 3CSRTT at 3 different reward probabilities: preward = 1, preward = 0.8, and preward = 0.5. Mean fluorescence (%ΔF/F) during reward as a function of nth correct trial in a session. See also Figure S7.

References

    1. Ables JL, Gorlich A, Antolin-Fontes B, Wang C, Lipford SM, Riad MH, Ren J, Hu F, Luo M, Kenny PJ, et al. (2017). Retrograde inhibition by a specific subset of interpeduncular α5 nicotinic neurons regulates nicotine preference. Proceedings of the National Academy of Sciences 114, 13012. - PMC - PubMed
    1. Adamantidis AR, Zhang F, Aravanis AM, Deisseroth K, and de Lecea L (2007). Neural substrates of awakening probed with optogenetic control of hypocretin neurons. Nature 450, 420–424. 10.1038/nature06310. - DOI - PMC - PubMed
    1. Afshar A, Santhanam G, Yu B, Ryu S, Sahani M, and Shenoy K (2011). Single-trial neural correlates of arm movement preparation. Neuron 71, 555–564. 10.1016/j.neuron.2011.05.047. - DOI - PMC - PubMed
    1. Aizawa H, Kobayashi M, Tanaka S, Fukai T, and Okamoto H (2012). Molecular characterization of the subnuclei in rat habenula. J. Comp. Neurol. 520, 4051. 10.1002/cne.23230. - DOI - PubMed
    1. Allen WE, Chen MZ, Pichamoorthy N, Tien RH, Pachitariu M, Luo L, and Deisseroth K (2019). Thirst regulates motivated behavior through modulation of brainwide neural population dynamics. Science 364, 253. 10.1126/science.aav3932. - DOI - PMC - PubMed

Publication types