Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2025 Jan 26:2025.01.26.634924.
doi: 10.1101/2025.01.26.634924.

Striatal cell-type specific stability and reorganization underlying agency and habit

Affiliations

Striatal cell-type specific stability and reorganization underlying agency and habit

Melissa Malvaez et al. bioRxiv. .

Abstract

Adaptive decision making requires agency, knowledge that actions produce particular outcomes. For well-practiced routines, agency is relinquished in favor of habit. Here, we asked how dorsomedial striatum D1+ and D2/A2A+ neurons contribute to agency and habit. We imaged calcium activity of these neurons as mice learned to lever press with agency and formed habits with overtraining. Whereas many D1+ neurons stably encoded actions throughout learning and developed encoding of reward outcomes, A2A+ neurons reorganized their encoding of actions from initial action-outcome learning to habit formation. Chemogenetic manipulations indicated that both D1+ and A2A+ neurons support action-outcome learning, but only D1+ neurons enable the use of such agency for adaptive, goal-directed decision making. These data reveal coordinated dorsomedial striatum D1+ and A2A+ function for the development of agency, cell-type specific stability and reorganization underlying agency and habit, and important insights into the neuronal circuits of how we learn and decide.

Keywords: decision making; devaluation; habit; instrumental conditioning; learning; reward; striatum.

PubMed Disclaimer

Conflict of interest statement

COMPETING FINANCIAL INTERESTS The authors have no biomedical financial interests or potential conflicts of interest to declare.

Figures

Extended Data Figure 1–1:
Extended Data Figure 1–1:. Behavior is goal-directed following limited random-interval instrumental training and habitual following overtraining.
(a) Procedure. Lever presses earned food pellet rewards on a random-interval (RI) 30-s reinforcement schedule. Mice received either limited training (4 RI sessions) or overtraining (8 sessions). Mice were then given a lever-pressing probe test in the Valued state, prefed on untrained food-pellet type to control for general satiety and Devalued state prefed on trained food-pellet type to induce sensory-specific satiety devaluation. Test order was counterbalanced across subjects within each group, with a single intervening retraining session. (b) Training press rate. 1-way ANOVA, Limited training: Training: F2.75, 19.27 = 12.08, P = 0.0001. Overtraining: Training: F2.24, 15.69 = 4.56, P = 0.02. (c) Test press rate. 2-way ANOVA, Value × Training duration: F1, 14 = 14.69, P = 0.002; Value: F1, 14 = 11.69, P = 0.004; Training duration: F1, 14 = 1.31, P = 0.27. (d) Devaluation index [(Devalued presses)/(Valued presses + Devalued presses)]. 2-tailed Mann-Whitney U test, U = 0, P = 0.002. N = 8/group (all male). Data presented as mean ± s.e.m. ***P < 0.001. Mice learn action-outcome relationships during instrumental conditioning on a random-interval schedule of reinforcement and use them for goal-directed decision making after limited training and form habits with overtraining.
Extended Data Figure 1–2:
Extended Data Figure 1–2:. Entries into the food-delivery port during training and test for DMS D1+ imaging experiment.
(a) Training entry rate. 1-way ANOVA, Training: F2.11, 6.31 = 0.75, P = 0.52 (b) Test entry rate. 2-tailed t-test, t3 = 0.94, P = 0.42, 95% CI −1.97 – 1.07. D1-cre: N = 4 (1 male). Data presented as mean ± s.e.m. Males = solid lines, Females = dashed lines.
Extended Data Figure 1–3:
Extended Data Figure 1–3:. Percent of DMS D1+ neurons activated or inhibited around actions and rewards.
(a) Percent of all recorded neurons (Early: N = 870, Middle: N = 999; End N = 994) classified (auROC values >95th percentile of the distribution of shuffled auROCs within 2 s before or after event) as significantly excited around initiating lever presses, terminating lever presses, or reward consumption. Similar proportions of DMS D1+ neurons were activated around each type of event and across training. 2-way ANOVA, Training session: F1.07, 3.21 = 0.24, P = 0.67; Event: F1.34, 4.03 = 2.76, P = 0.17; Training × Event: F1.71, 5.14 = 0.48, P = 0.62. (b) Percent of all recorded neurons classified as significantly inhibited around initiating lever presses, terminating lever presses, or reward consumption. Similar proportions of the DMS D1+ neurons were inhibited around each type of event. With training there was a slight increase in the proportion of neurons inhibited around actions. 2-way ANOVA, Training: F1.75, 5.26 = 5.34, P = 0.06; Event: F1.47, 4.40 = 4.32, P = 0.10; Training × Event: F1.66, 4.98 = 1.56, P = 0.29. D1-cre: N = 4 (1 male). Data presented as mean ± s.e.m. Males = solid lines, Females = dashed lines.
Extended Data Figure 2–1.
Extended Data Figure 2–1.. Representative example of coregistration of DMS D1+ neurons across training.
(a) Distribution of the distance between cell centroids (as a fraction of total cell diameter) of co-registered cell pairs in the 1st (early) and 4th (middle) training sessions. (b) Distribution of centroid distance of co-registered cell pairs in the 1st and 8th (overtrain) training sessions. (c) Distribution of centroid distance of co-registered cell pairs in the 4th and 8th training sessions.
Extended Data Figure 2–2.
Extended Data Figure 2–2.. Behavioral events can be decoded from the activity of the DMS D1+ action-initiation excited neuronal ensembles.
(a-b) Decoding of behavioral events from the activity of DMS D1+ early action-initiation excited neurons. (a) Behavior class (initiating lever press, terminating press, reward collection, non-reinforced food-port check) decoding accuracy compared to shuffled control. Line at 0.25 = chance. 3-way ANOVA, Neuron activity (v. shuffled): F1, 3 = 1118.27, P < 0.001; Training session: F2, 6 = 2.79, P = 0.14; Behavior class: F3, 9 = 16.94, P < 0.001; Neuron activity × Training: F2, 6 = 2.10, P = 0.20; Neuron activity × Behavior Class: F3, 9 = 3.53, P = 0.06; Training × Behavior Class: F6, 18 = 3.29, P = 0.23; Neuron activity × Training × Behavior class: F6, 18 = 0.30, P = 0.93. (b) Lever-press rate decoding accuracy. R = correlation coefficient between actual and decoded press rate. 2-way ANOVA, Neuron activity: F1, 3 = 12.70, P = 0.04; Training: F1.16, 3.49 = 0.08, P = 0.83; Neuron activity × Training: F1.20, 3.60 = 0.09, P = 0.82. (c) Accuracy with which lever-press rate can be can be decoded from the activity of DMS D1+ early action-initiation inhibited neurons. 2-way ANOVA, Neuron activity: F1, 3 = 53.88, P = 0.005; Training: F1.76, 5.29 = 0.09, P = 0.89; Neuron activity × Training: F1.73, 5.38 = 0.11, P = 0.88. (d-e) Decoding of behavioral events from the activity of DMS D1+ overtrain action-initiation excited neurons. (d) Behavior class decoding accuracy compared to shuffled control. 3-way ANOVA, Neuron activity: F1, 3 = 262.75, P < 0.001; Training session: F2, 6 = 2.78, P = 0.14; Behavior class: F3, 9 = 11.31, P = 0.002; Neuron activity × Training: F2, 6 = 2.16, P = 0.20; Neuron activity × Behavior Class: F3, 9 = 3.86, P = 0.05; Training × Behavior Class: F6, 18 = 3.27, P = 0.025; Neuron activity × Training × Behavior class: F6, 18 = 0.52, P = 0.79. (e) Lever-press rate decoding accuracy. 2-way ANOVA, Neuron activity: F1, 3 = 24.76, P = 0.02; Training: F1.02, 3.05 = 1.25, P = 0.35; Population activity × Training: F1.00, 3.04 = 1.24, P = 0.35. (f) Accuracy with which lever-press rate can be can be decoded from the activity of DMS D1+ overtrain action-initiation inhibited neurons. 2-way ANOVA, Neuron activity: F1, 3 = 84.88, P = 0.003; Training: F1, 3 = 1.25, P = 0.35; Neuron activity × Training: F1, 3 = 1.30, P = 0.34. D1-cre: N = 4 (1 male). Data presented as mean ± s.e.m. Males = solid lines, Females = dashed lines. **P < 0.01. Actions, action rate, checking for and receiving reward, can be decoded from both the population activity of DMS D1+ neurons that are excited by action initiation early in training and those neurons that are excited by action initiation after overtraining. Decoding of action-initiation improves with limited training, but not further with overtraining. Action rate can also be decoded from the DMS D1+ neurons that are inhibited by action initiation.
Extended Data Figure 2–3.
Extended Data Figure 2–3.. Ensembles of DMS D1+ neurons are excited or inhibited by action initiation, action termination, and reward.
(a-c) Percent of all recorded coregistered neurons (Average 137 coregistered neurons/mouse, s.e.m. 37.73) significantly excited by action initiation, action termination, and reward. (a) Approximately 13% of coregistered D1+ neurons were excited by action initiation. 1-way ANOVA, F1.13, 3.63 = 0.14, P = 0.76. (b) Approximately 8 – 10% of coregistered D1+ neurons were excited by action termination. 1-way ANOVA, F1.62, 4.85 = 0.37, P = 0.67. (c) Approximately 12 – 13% of coregistered D1+ neurons were excited by reward. 1-way ANOVA, F1.46, 4.38 = 0.12, P = 0.83. (d-f) Percent of all recorded coregistered neurons significantly inhibited by action initiation, action termination, and reward. (d) Approximately 30 – 33% of coregistered D1+ neurons were inhibited by action initiation. 1-way ANOVA, F1.29, 3.86 = 0.14, P = 0.79. (e) Approximately 21 – 29% of coregistered D1+ neurons were inhibited by action termination. 1-way ANOVA, F1.17, 3.50 = 1.29, P = 0.34. (f) Approximately 25 – 30% of coregistered DMS D1+ neurons were inhibited by reward. 1-way ANOVA, F1.91, 5.76 = 2.52, P = 0.16). D1-cre: N = 4 (1 male). Data presented as mean ± s.e.m. Males = solid lines, Females = dashed lines. In no case did the proportion of excited or inhibited D1+ neurons change with training.
Extended Data Figure 2–4.
Extended Data Figure 2–4.. Quantification of the modulation of DMS D1+ action-initiation excited neurons.
(a) Modulation across training of DMS D1+ early action-initiation excited neurons (N = 77 neurons/4 mice; average 19.25 neurons/mouse, s.e.m. = 5.36). Modulation index averaged across 0.5-s bins around action initiation. 2-way ANCOVA, Training: F1.77, 133.28 = 1.24, P = 0.29; Time bin: F1.79, 133.96 = 13.96, P < 0.001; Training × Time: F3.40, 254.94 = 2.94, P = 0.028. (b) Modulation across training of D1+ overtrain action-initiation excited neurons (N = 76 neurons/4 mice; average 19 neurons/mouse, s.e.m. 5.49). Modulation index around action initiation. 2-way ANCOVA, Training: F1.8, 79.90 = 7.18, P = 0.002; Time bin: F1.64, 65.47 = 1.50, P = 0.23; Training × Time: F4.28, 171.34 = 1.50, P = 0.20. Data presented as mean ± s.e.m.
Extended Data Figure 2–5.
Extended Data Figure 2–5.. DMS D1+ neurons encode action initiation with high fidelity.
(a-c) Fidelity with which D1+ early action-initiation excited neurons encode action initiation. (a) Distribution of the percentage of early action-initiation excited neurons as a function of the percentage of action-initiation events to which they respond for each training phase. (b) Cross-session correlation of the response distributions. 2-way ANOVA, Neuron activity distribution (v. shuffled): F1, 3 = 16.06, P = 0.03; Training session: F1, 3 = 0.25, P = 0.65; Distribution × Training: F1, 3 = 0.11, P = 0.76. Early action-initiation D1+ neurons tend to respond on more than half the action-initiation events across training and this is consistent across training. (c) Within-session correlation of the activity around action initiation of each early action-initiation excited neuron. 2-way ANCOVA, Neuron activity: F1, 74 = 27.35, P < 0.001; Training: F2, 148 = 5.84, P = 0.004; Activity × Time: F2, 148 = 6.28, P = 0.002. The activity of early action-initiation excited D1+ neurons around action initiation is correlated above shuffled control within a training session and becomes more correlated with training. (d-e) Fidelity with which D1+ overtrain action-initiation excited neurons encode action initiation. (d) Distribution of the percentage of overtrain action-initiation excited neurons as a function of the percentage of action-initiation events to which they respond for each training phase. (e) Cross-session correlation of the response distributions. 2-way ANOVA, Neuron activity distribution: F1, 3 = 19.36, P = 0.02; Training: F1, 3 = 0.27, P = 0.64; Distribution × Training: F1, 3 = 0.35, P = 0.60. Overtrain action-initiation D1+ neurons tend to respond on more than half the action-initiation events across training and this is consistent across training. (f) Within-session correlation of the activity around action initiation of each overtrain action-initiation excited neuron. 2-way ANCOVA, Neuron activity: F1, 73 = 19.32, P < 0.001; Training: F2, 146 = 1.87, P = 0.16; Activity × Time: F2, 146 = 1.96, P = 0.15. The activity of overtrain action-initiation excited D1+ neurons around action initiation is correlated above shuffled control within a training session and becomes more correlated with training. D1-cre: N = 4 (1 male). Data presented as mean ± s.e.m. Males = closed circles, Females = open circles.
Extended Data Figure 2–6.
Extended Data Figure 2–6.. An ensemble of DMS D1+ neurons is inhibited during action initiation across learning and as habits form.
(a) Percent of all recorded coregistered DMS D1+ neurons (Average 137 coregistered neurons/mouse, s.e.m. 37.73) significantly inhibited by action initiation. Approximately 30 – 32% of DMS D1+ neurons were inhibited around action initiation and this did not change with training. 1-way ANOVA, F1.29, 3.86 = 0.14, P = 0.79. (b) Percent of D1+ early action-initiation inhibited neurons that continued to be significantly inhibited by action initiation on the 4th and 8th training sessions. Approximately 52 – 55% of the early action-initiation inhibited ensemble continued to be inhibited around action initiation during the middle and overtraining phases of training. The proportion preserved did not change with training. 2-tailed t-test, t3 = 0.56, P =0.62, 95% CI −19.03 – 13.37. (c) Percent of D1+ overtrain action-initiation inhibited neurons that were also significantly inhibited by action initiation on 1st and 4th training sessions. Approximately 51 – 59% of the overtraining action-initiation inhibited ensemble was also inhibited around action initiation during the preceding early and middle training phases. The proportion preserved did not change with training. 2-tailed t-test, t3 = 2.02, P =0.14, 95% CI −4.52 – 20.15. (d-g) Activity and modulation across training of DMS D1+ early action-initiation inhibited neurons. Heat map of minimum to maximum deconvolved activity (sorted by total activity) (d), Z-scored activity (e), and area under the receiver operating characteristic curve (auROC) modulation index (f) of these cells around action initiation across training. (g) auROC modulation index averaged across 0.5-s bins around action initiation. 2-way ANCOVA, Training: F1.65, 246.27 = 1.46, P = 0.24; Time bin: F2.16, 321.48 = 2.80, P = 0.06; Training × Time: F4.70, 699.66 = 0.27, P = 0.92. This modulation of this early action-initiation inhibited ensemble did not significantly change with training. (h-k) Activity and modulation across training of DMS D1+ overtrain action-initiation inhibited neurons. Heat map (h), Z-scored activity (i), and auROC modulation index (j) of these cells around action initiation across training. (k) Modulation index around action initiation. 2-way ANCOVA, Training: F1.82, 347.95 = 7.25, P = 0.001; Time bin: F1.91, 365.08 = 2.37, P = 0.10; Training × Time: F4.35, 831.34 = 0.67, P = 0.62. This overtrain action-initiation inhibited ensemble became more inhibited by action initiation as training progressed. D1-cre: N = 4 (1 male). Data presented as mean ± s.e.m. Males = solid lines, Females = dashed lines.
Extended Data Figure 2–7.
Extended Data Figure 2–7.. Ensembles of DMS D1+ neurons are modulated by action termination and reward during learning and as habits form.
(a) Percent of all recorded coregistered DMS D1+ neurons (Average 137 coregistered neurons/mouse, s.e.m. 37.73) significantly modulated (excited or inhibited, see Extended Data Figure 2–3) by action termination. Approximately 30 – 39% of D1+ neurons were modulated around action termination and this did not change with training. 1-way ANOVA, F1.53, 4.58 = 1.10, P = 0.38. (b) Percent of DMS D1+ early action-termination excited neurons (N = 53 neurons/4 mice; average 13.25 neurons/mouse, s.e.m. 5.73) that were then also significantly excited by action termination on the 4th (middle) and 8th (overtrain) training sessions. Approximately 30 – 37% of the early action-termination ensemble continued to be excited by action termination across training. The proportion preserved did not change with training. 2-tailed t-test, t3 = 0.85, P =0.46, 95% CI −20.34 – 35.07. (c) Percent of DMS D1+ overtrain action-termination excited neurons (N = 58 neurons/4 mice; average 14.5 neurons/mouse, s.e.m. 3.59) that were also significantly excited by action termination on the 1st (early) and middle training sessions. Approximately 59 – 76% of the overtraining action-termination ensemble was also excited by action termination during the preceding training phases. The proportion preserved did not significantly change with training. 2-tailed t-test, t3 = 2.52, P =0.09, 95% CI −4.48 to 38.31. (d-g) Activity and modulation across training of DMS D1+ early action-termination excited neurons. Heat map of minimum to maximum deconvolved activity (sorted by total activity) (d), Z-scored activity (e), and area under the receiver operating characteristic curve (auROC) modulation index (f) of these neurons around action termination across training. (g) auROC modulation index averaged across 0.5-s bins around action termination. 2-way ANCOVA, Training: F1.92, 97.81 = 1.08, P = 0.34; Time bin: F1.65, 84.19 = 0.48, P = 0.59; Training × Time: F4.20, 214.10 = 1.33, P = 0.26. Modulation of this early action-termination ensemble did not significantly change with training. (h-k) Activity and modulation across training of DMS D1+ overtrain action-termination excited neurons. Heat map (h), Z-scored activity (i), and auROC modulation index (j) of these cells around action termination across training. (k) Modulation index around action termination. 2-way ANOVA, Training: F1.77, 99.05 = 1.14, P = 0.32; Time bin: F1.78, 99.88 = 0.03, P = 0.96; Training × Time: F4.20, 235.08 = 0.71, P = 0.60. Modulation of the overtrain action-termination ensemble did not significantly change as training progressed. (l) Percent of coregistered D1+ neurons significantly modulated (excited or inhibited) by earned reward. Approximately 39 – 42% of D1+ neurons were modulated by earned reward and this did not change with training. Friedman test, x2(2) = 1.20, P = 0.58. (m) Percent of D1+ early reward excited neurons (N = 73 neurons/4 mice; average 18.25 neurons/mouse, s.e.m. 7.36) that continued to be significantly excited by reward during the middle and overtraining phases. Approximately 26 – 34% of the early reward ensemble continued to be excited by reward across training. The proportion preserved did not change with training. 2-tailed t-test, t3 = 0.70, P =0.53, 95% CI −40.49 – 25.89. (n) Percent of DMS D1+ overtrain reward excited neurons (N = 70 neurons/4 mice; average 17.5 neurons/mouse, s.e.m. 6.34) that were significantly excited by reward on the early and middle training sessions. Approximately 24 – 37% of the overtraining reward ensemble was also excited by reward during the preceding training phases. The proportion preserved did not change with training. 2-tailed t-test, t3 = 1.31, P =0.28, 95% CI −18.55 – 44.33. (o-r) Activity and modulation across training of DMS D1+ early reward excited neurons. Heat map (o), Z-scored activity (p), and auROC modulation index (q) of these cells around reward across training. (r) Modulation index around reward. 2-way ANCOVA, Training: F1.74, 123.26 = 0.82, P = 0.42; Time bin: F2.45, 175.73 = 6.07, P = 0.01; Training × Time: F4.82, 342.52 = 1.59, P = 0.16. This early reward ensemble was more excited after earned reward than before, suggesting modulation by reward experience. This did not significantly change as training progressed. (s-v) Activity and modulation across training of D1+ overtrain reward excited neurons. Heat map (s), Z-scored activity (t), and auROC modulation index (u) of these cells around reward on across training. (v) Modulation index around reward. 2-way ANCOVA Training: F1.77, 120.87 = 2.26, P = 0.12; Time bin: F2.36, 160.42 = 1.30, P = 0.28; Training × Time: F5.37, 365.26 = 0.28, P = 0.93. Modulation of the overtrain reward ensemble did not significantly change with training. D1-cre: N = 4 (1 male). Data presented as mean ± s.e.m. Males = solid lines, Females = dashed lines.
Extended Data Figure 2–8.
Extended Data Figure 2–8.. A subensemble of stable DMS D1+ action-initiation excited neurons develop encoding of action termination and earned reward with training.
We identified an ensemble (N = 31 neurons/4 D1-cre mice, 1 male; average 7.75 neurons/mouse, s.e.m. = 2.56) of D1+ neurons that stably encoded action-initiation across all phases of training. (a) Modulation, averaged in 0.5-s bins around action initiation of these neurons. 2-way ANCOVA, Training: F1.70, 49.21 = 0.04, P = 0.94; Time bin: F1.73, 50.27 = 5.02, P = 0.01; Training × Time: F2.28, 123.97 = 2.78, P = 0.03. Stable action-initiation excited D1+ neurons are more activated prior to action initiation than after and this modulation improves with training. (b-c) Heat map of minimum to maximum deconvolved activity (sorted by total activity) (b) and Z-scored (c) activity around action termination across training of the D1+ stable action-initiation excited neurons. (d) Modulation, averaged in 0.5-s bins around action termination of these neurons. 2-way ANCOVA, Training: F1.87, 54.10 = 1.01, P = 0.37; Time bin: F2.00, 57.95 = 1.48, P = 0.23; Training × Time: F3.62, 104.97 = 1.98, P = 0.11. (e-f) Heat map (e) and Z-scored (f) activity around reward collection across training of D1+ stable action-initiation excited neurons. (g) Modulation, averaged in 0.5-s bins around earned reward of these neurons. 2-way ANCOVA, Training: F1.99, 57.57 = 2.61, P = 0.08; Time bin: F1.84, 53.36 = 0.80, P = 0.45; Training × Time: F3.87, 112.10 = 1.74, P = 0.15. Data presented as mean ± s.e.m. Males = solid lines, Females = dashed lines.
Extended Data Figure 3–1:
Extended Data Figure 3–1:. Validation of chemogenetic approach to inhibit or activate DMS D1+ or A2A+ neurons.
Fluorescence-guided, whole-cell patch clamp recordings in current-clamp mode were used to validate the efficacy of chemogenetic manipulation of DMS D1+ and A2A+ neurons. (a) Representative immunofluorescent image of a biocytin-filled, mCherry-positive cell. (b) Representative recordings of action potentials in single cells before (Baseline) and after CNO (10 μM for hM3Dq and 100 μM for hM4Di) bath application. Current injection intensity applied to induce action potential firing was 200 (Control, D1+ hM3Dq, A2A+ hM3Dq) or 250 pA (all others), with the same intensity used for baseline and CNO recordings. Cell membrane potentials were held at −80 mV using negative current to ensure consistent baseline conditions across recordings. (c) Percent change in action potentials under CNO, relative to pre-CNO baseline. 1-way ANOVA, F4, 58 = 13.29, P < 0.0001. Control, N = 19 cells, 4 D1-cre mice (1 male); D1-cre, hm4Di: N = 8 cells, 4 mice (3 males), hM3Dq: N = 10 cells, 5 mice (3 males); A2A-cre, hM4Di: N = 8 cells, 4 mice (3 males), hM3Dq: N = 10 cells, 5 mice (3 males). Data presented as mean ± s.e.m. Males = closed circles, Females = open circles. Scale bar = 10 μm. *P < 0.05, **P < 0.01, ***P < 0.001, uncorrected. We were able to effectively chemogenetically inhibit and activate action potentials in both D1+ and A2A+ DMS neurons.
Extended Data Figure 3–2.
Extended Data Figure 3–2.. Food-port entries during training and test with DMS D1+ neuron chemogenetic manipulation.
(a-c) Chemogenetic inactivation of DMS D1+ neurons during learning. WT: N = 7 (2 males); D1-cre: N = 9 (5 males). (a) Training entry rate. 2-way ANOVA, Training: F2.35, 32.91 = 7.40, P = 0.001; Genotype: F1, 14 = 1.30, P = 0.27; Training × Genotype: F4, 56 = 0.46, P = 0.77. (b) Test entry rate. 2-way ANOVA, Value × Genotype: F1, 14 = 0.003, P = 0.96; Value: F1, 14 = 0.05, P = 0.83; Genotype: F1, 14 = 0.03, P = 0.86. (c) Training average presses/earned reward outcome. 2-way ANOVA, Training: F2.05, 28.68 = 32.52, P < 0.0001; Genotype: F1, 14 = 0.57, P = 0.46; Training × Genotype: F4, 56 = 0.77, P = 0.55. (d-f) Chemogenetic activation of DMS D1+ neurons during learning. WT: N = 6 (4 males); D1-cre: N = 6 (3 males). (d) Training entry rate. 2-way ANOVA, Training: F2.38, 23.79 = 12.75, P < 0.0001; Genotype: F1, 10 = 0. 21, P = 0.74; Training × Genotype: F8, 80 = 0.74, P = 0.65. (d) Test entry rate. 2-way ANOVA, Value × Genotype: F1, 10 < 0.0001, P > 0.99; Value: F1, 10 = 0.89, P = 0.37; Genotype: F1, 10 = 0.51, P = 0.49. (f) Training average presses/earned reward outcome. 2-way ANOVA, Training: F4.03, 60.47 = 9.42, P < 0.0001; Genotype: F1, 15 = 0.0001, P = 0.99; Training × Genotype: F8, 120= 0.60, P = 0.78. (g-i) Chemogenetic inactivation of DMS D1+ neurons at test of behavioral control strategy after learning. WT: N = 12 (7 males); D1-cre: N = 12 (5 males). (g) Training entry rate. 2-way ANOVA, Training: F3.01, 54.19 = 13.85, P < 0.0001; Genotype: F1, 18 < 0.0001, P > 0.99; Training × Genotype: F4, 72 = 0.34, P = 0.85. (h) Test entry rate. 2-way ANOVA, Value × Genotype: F1, 18 = 0.001, P = 0.97; Value: F1, 18 = 11.24, P = 0.004; Genotype: F1, 18 = 3.02, P = 0.10. (i) Training average presses/earned reward outcome. 2-way ANOVA, Training: F1.86, 44.59 = 96.52, P < 0.0001; Genotype: F2, 24= 0.13, P = 0.88; Training × Genotype: F8, 96 = 0.13, P > 0.99. Data presented as mean ± s.e.m. Males = solid lines, Females = dashed lines. *P < 0.05. Neither chemogenetic inhibition nor activation of DMS D1+ neurons affected checks of the food-delivery port or altered the press-reward action-outcome relationship.
Extended Data Figure 4–1:
Extended Data Figure 4–1:. Entries into the food-delivery port during training and test for DMS A2A+ imaging experiment- habitual subjects.
(a) Training entry rate. 1-way ANOVA, Training: F2.69, 8.06 = 1.93, P = 0.20 (b) Test entry rate. 2-tailed t-test, t3 = 1.87, P = 0.16, 95% CI −5.00 – 1.30. Data presented as mean ± s.e.m. Males = solid lines, Females = dashed lines.
Figure 4–2:
Figure 4–2:. DMS A2A+ neurons are modulated by actions and rewards during instrumental learning and overtraining in subjects that did not form habits.
(a) Representative image of cre-dependent jGCaMP7s expression in DMS A2A+ neurons. (b) Map of DMS GRIN lens placements for all subjects. (c) Representative dF/F and deconvolved calcium signal. (d) Procedure. RI, random-interval reinforcement schedule. (e) Training press rate. 2-way ANOVA, Training: F1.49, 4.48 = 5.58, P = 0.06. (f) Test press rate. 2-tailed t-test, t3 = 2.30, P = 0.11, 95% CI −31.60 – 5.10. (g) Devaluation index [(Devalued condition presses)/(Valued condition presses + Devalued presses)]. One-tailed Bayes factor, BF10 = 5.40. (h-j) Activity of DMS A2A+ neurons on the 1st (Early) session of RI training. (h) Percent of all recorded neurons (N = 693) significantly modulated by lever presses, food-delivery port checks, and reward. (i) Heat map of minimum to maximum deconvolved activity (sorted by total activity) of each DMS A2A+ neuron significantly modulated around lever-press action initiation (right), action termination (middle), or reward consumption (left). Above red line = excited, below = inhibited. (j) Z-scored activity of each population of modulated neurons. (k-m) Activity of DMS A2A+ neurons on the 4th (Middle) training session. (k) Percent of all recorded neurons (N = 779) significantly modulated by lever presses, food-delivery port checks, and reward. (l) Heat map of minimum to maximum deconvolved activity of each DMS A2A+ neuron significantly modulated around lever-press action initiation (right), action termination (middle), or reward (left). (m) Z-scored activity of each population of modulated neurons. (n-p) Activity of DMS A2A+ neurons on the 8th (End, Overtrain) training session. (k) Percent of all recorded neurons (N = 739) significantly modulated by lever presses, food-delivery port checks, and reward. (l) Heat map of minimum to maximum deconvolved activity of each DMS A2A+ neuron significantly modulated around lever-press action initiation (right), action termination (middle), or reward (left). (m) Z-scored activity of each population of modulated neurons. Data presented as mean ± s.e.m. A2A-cre: N = 4 (3 male). Males = closed circles/solid lines, Females = open circles/dashed lines. A2A+ neurons in subjects that did not form habits with overtraining are activated prior to action initiation and termination and a subset are also activated by earned reward experience.
Extended Data Figure 4–3:
Extended Data Figure 4–3:. Percent of DMS A2A+ neurons activated or inhibited around actions and rewards.
(a) Percent of all recorded neurons (Early: N = 671, Middle: N = 697; End N = 658) classified (auROC values >95th percentile of the distribution of shuffled auROCs within 2 s before or after event) as significantly excited around initiating lever presses, terminating lever presses, or reward consumption. Action initiation excited more DMS A2A+ neurons than termination or reward. 2-way ANOVA, Event: F1.12, 3.36 = 23.83, P = 0.01; Training session: F1.32, 3.97 = 0.05, P = 0.54; Training × Event: F1.86, 5.59 = 0.29, P = 0.77. (b) Percent of all recorded neurons classified as significantly inhibited around initiating lever presses, terminating lever presses, or reward consumption. Similar proportions of the DMS A2A+ neurons were inhibited around each event type. 2-way ANOVA, Event: F1.90, 5.69 = 2.03, P = 0.23; Training session: F1.12, 3.36 = 0.05, P = 0.86; Training × Event: F1.45, 4.33 = 0.90, P = 0.44. Data presented as mean ± s.e.m. Males = solid lines, Females = dashed lines.
Extended Data Figure 5–1.
Extended Data Figure 5–1.. Representative example of coregistration of DMS A2A+ neurons across training.
(a) Distribution of the distance between cell centroids (as a fraction of total cell diameter) of co-registered cell pairs in the 1st (early) and 4th (middle) training sessions. (b) Distribution of centroid distance of co-registered cell pairs in the 1st and 8th (overtrain) training sessions. (c) Distribution of centroid distance of co-registered cell pairs in the 4th and 8th training sessions.
Extended Data Figure 5–2.
Extended Data Figure 5–2.. Instrumental behavior can be decoded from the activity of DMS A2A+ action-initiation neuronal ensemble.
(a-b) Decoding of behavioral events from the activity of DMS A2A+ early action-initiation excited neurons. (a) Behavior class (initiating lever press, terminating press, reward collection, non-reinforced food-port check) decoding accuracy compared to shuffled control. Line at 0.25 = chance. 3-way ANOVA, Neuron activity (v. shuffled): F1, 3 = 169.03, P < 0.001; Training session: F2, 6 = 5.55, P = 0.04; Behavior class: F3, 9 = 50.84, P < 0.001; Neuron activity × Training: F2, 6 = 5.27, P = 0.05; Neuron activity × Behavior Class: F3, 9 = 4.63, P = 0.03; Training × Behavior Class: F6, 18 = 1.79, P = 0.16; Neuron activity × Training × Behavior class: F6, 18 = 0.34, P = 0.91. (b) Lever-press rate decoding accuracy. R = correlation coefficient between actual and decoded press rate. 2-way ANOVA, Neuron activity: F1, 3 = 3.76, P = 0.14; Training: F1.33, 3.99 = 4.49, P = 0.19; Neuron activity × Training: F1.32, 3.97 = 2.52, P = 0.819. (c) Accuracy with which lever-press rate can be can be decoded from the activity of DMS A2A+ early action-initiation inhibited neurons. 2-way ANOVA, Neuron activity: F1, 3 = 7.44, P = 0.07; Training: F1.40, 4.21 = 0.17, P = 0.77; Neuron activity × Training: F1.41, 4.22 = 0.16, P = 0.79. (d-f) Decoding of behavioral events from the activity of DMS A2A+ overtrain action-initiation excited neurons. (d) Behavior class decoding accuracy compared to shuffled control. 3-way ANOVA, Neuron activity: F1, 3 = 1296.08, P < 0.001; Training session: F2, 6 = 2.75, P = 0.14; Behavior class: F3, 9 = 32.57, P < 0.001; Neuron activity × Training: F2, 6 = 1.99, P = 0.22; Neuron activity × Behavior Class: F3, 9 = 11.86, P = 0.002; Training × Behavior Class: F6, 18 = 2.36, P = 0.07; Neuron activity × Training × Behavior class: F6, 18 = 2.91, P = 0.04. (e) Lever-press rate decoding accuracy. 2-way ANOVA, Neuron activity: F1, 3 = 1.92, P = 0.26; Training: F1.66, 4.96 = 0.57, P = 0.57; Neuron activity × Training: F1.63, 4.88 = 0.54, P = 0.58. (f) Accuracy with which lever-press rate can be can be decoded from the activity of DMS A2A+ overtrain action-initiation inhibited neurons. 2-way ANOVA, Neuron activity: F1, 3 = 30.03, P = 0.01; Training: F1.52, 4.57 = 0.45, P = 0.61; Neuron activity × Training: F1.51, 4.54 = 0.48, P = 0.60. Data presented as mean ± s.e.m. Males = solid lines, Females = dashed lines. **P < 0.01, ***P < 0.001. Actions, checking for, and receiving reward, can be decoded from both the population activity of DMS A2A+ neurons that are excited by action initiation early in training and those neurons that are excited by action initiation after overtraining. Population decoding of action initiation improves with some training and then decreases with overtraining. Action rate can only be significantly decoded from overtrain action-initiation inhibited DMS A2A+ neurons, suggesting these neurons contribute to the decoding accuracy of the whole population of DMS A2A+ neurons.
Extended Data Figure 5–3.
Extended Data Figure 5–3.. Ensembles of DMS A2A+ neurons are excited and inhibited by action initiation, action termination, and reward.
(a-c) Percent of all recorded coregistered neurons (Average 73.25 coregistered neurons/mouse, s.e.m. 17.09) significantly excited by action initiation, action termination, or reward. (a) Approximately 7 – 12% of coregistered DMS A2A+ neurons were excited by action initiation. 1-way ANOVA, F1.16, 3.48 = 1.79, P = 0.27. (b) Approximately 4 – 10% of coregistered DMS A2A+ neurons were excited by action termination. 1-way ANOVA, F1.48, 4.43 = 1.12, P = 0.38. (c) Only 3 – 5% of coregistered DMS A2A+ neurons were excited by reward. 1-way ANOVA, F1.89, 5.68 = 0.75, P = 0.51. (d-f) Percent of all recorded coregistered neurons significantly inhibited by action initiation, action termination, or reward. (d) Approximately 18–24% of coregistered DMS A2A+ neurons were inhibited by action initiation. 1-way ANOVA, F1.05, 3.16 = 1.14, P = 0.37. (e) Approximately 19 – 21% of coregistered DMS A2A+ neurons were inhibited by action termination. 1-way ANOVA, F1.02, 3.05 = 0.20, P = 0.69. (f) Approximately 19 – 27% of coregistered DMS A2A+ neurons were inhibited by reward. 1-way ANOVA, F1.76, 5.28 = 3.52, P = 0.11. Data presented as mean ± s.e.m. Males = solid lines, Females = dashed lines. In no case did the proportion of excited or inhibited A2A+ neurons change with training.
Extended Data Figure 5–4.
Extended Data Figure 5–4.. Quantification of the modulation of DMS A2A+ action-initiation excited neurons.
(a) Modulation across training of DMS A2A+ early action-initiation excited neurons (N = 42 neurons/4 mice; average 16.8 neurons/mouse, s.e.m. = 5.61). Modulation index averaged across 0.5-s bins around action initiation. 2-way ANCOVA, Training: F1.74, 69.82 = 4.10, P = 0.03; Time bin: F1.70, 68.14 = 0.18, P = 0.80; Training × Time: F4.70, 187.84 = 0.90, P = 0.68. (b) Modulation across training of A2A+ overtrain action-initiation excited neurons (N = 25 neurons/4 mice; average 6.25 neurons/mouse, s.e.m. 2.69). Modulation index around action initiation. 2-way ANCOVA, Training: F1.75, 40.22 = 1.15, P = 0.32; Time bin: F1.74, 39.90 = 0.10, P = 0.88; Training × Time: F4.23, 97.45 = 0.64, P = 0.70. Data presented as mean ± s.e.m.
Extended Data Figure 5–5.
Extended Data Figure 5–5.. Encoding of other task variables by A2A+ neurons that drop out of the early action-initiation ensemble or get incorporated into the overtrain action-initiation excited ensemble.
(a-b) Percentage of the neurons that dropout of the early action-initiation excited ensemble that are excited (a; 2-way ANOVA, Event × Training: F1.79, 5.37 = 7.05, P = 0.03; Event: F1.21, 3.63 = 30.74, P = 0.006; Training: F1.13, 3.40 = 14.46, P = 0.02) or inhibited (b; 2-way ANOVA, Event: F1.98, 5.94 = 1.64, P = 0.27; Training: F1.50, 4.50 = 2.77, P = 0.17; Event × Training: F1.46, 4.37 = 1.25, P = 0.35) by other events (terminating lever press, reward, food-delivery port check). By definition, these neurons stop being activated by action initiation with training. They do not begin to be activated by other task events with training. Instead, fewer of these neurons are activated by other task events and some of them become inhibited by tasks events. (c-d) Percentage of the neurons that are incorporated into the overtrain action-initiation excited ensemble that are excited (c; 2-way ANOVA, Training: F1.1, 3.48 = 11.06, P = 0.03; Event: F2.24, 6.71 = 3.67, P = 0.08; Event × Training: F1.54, 4.62 = 2.41, P = 0.19) or inhibited (d; 2-way ANOVA, Event: F1.16, 3.48 = 1.95, P = 0.25; Training: F1.09, 3.28 = 1.24, P = 0.35; Event × Training: F1.89, 5.68 = 0.83, P = 0.48) by other events (terminating lever press, reward, food-delivery port check). By definition, these neurons became activated by action initiation at overtraining. Earlier in training very few of these neurons were activated by other events. A small proportion of these neurons were inhibited by task events earlier in training and this proportion decreased to near 0 with training. Data presented as mean ± s.e.m. Males = solid lines, Females = dashed lines.
Extended Data Figure 5–6.
Extended Data Figure 5–6.. Fidelity with which DMS A2A+ neurons encode action initiation.
(a-c) Fidelity with which A2A+ early action-initiation excited neurons encode action initiation. (a) Distribution of the percentage of early action-initiation excited neurons as a function of the percentage of action-initiation events to which they respond for each training phase. (b) Cross-session correlation of the response distributions. 2-way ANOVA, Neuron activity distribution (v. shuffled): F1, 3 = 0.03, P = 0.88; Training sessions: F1, 3 = 1.34, P = 0.33; Distribution × Training: F1, 3 = 2.99, P = 0.18. Early action-initiation A2A+ neurons tend to respond on less than half of the action-initiation events and this decreases with training and is not correlated across training sessions. (c) Within-session correlation of the activity around action initiation of each early action-initiation excited neuron. 2-way ANCOVA, Neuron activity: F1, 36 = 2.71, P = 0.11; Training: F2, 72= 2.47, P = 0.09; Activity × Time: F2, 72 = 2.45, P = 0.09. The activity of early action-initiation excited A2A+ neurons around action initiation is not significantly correlated within a training session. (d-e) Fidelity with which A2A+ overtrain action-initiation excited neurons encode action initiation. (d) Distribution of the percentage of overtrain action-initiation excited neurons as a function of the percentage of action-initiation events to which they respond for each training phase. (e) Cross-session correlation of the response distributions. 2-way ANOVA, Neuron activity distribution: F1, 2 = 32.64, P = 0.03; Training sessions: F1, 2 = 0.54, P = 0.54; Distribution × Training: F1, 2 = 0.45, P = 0.57. Overtrain action-initiation A2A+ neurons tend to respond on less than half the action-initiation events across training and this is consistent across training. (f) Within-session correlation of the activity around action initiation of each overtrain action-initiation excited neuron. 2-way ANCOVA, Neuron activity: F1, 22.00= 0.40, P = 0.53; Training: F1.24, 27.30 = 0.63, P = 0.47; Activity × Time: F1.23, 27.09 = 0.65, P = 0.46. The activity of overtrain action-initiation excited A2A+ neurons around action initiation is not correlated above shuffled control within a training session. A2A-cre: N = 4 (1 male). Data presented as mean ± s.e.m. Males = closed circles, Females = open circles.
Extended Data Figure 5–7.
Extended Data Figure 5–7.. The ensemble of DMS A2A+ neurons inhibited by action initiation shifts as habits form.
(a) Percent of all recorded coregistered DMS A2A+ neurons (Average 73.25 coregistered neurons/mouse, s.e.m. 17.09) significantly inhibited by action initiation. Approximately 18 – 24% of coregistered DMS A2A+ neurons were inhibited by action initiation and this did not significantly change across training. 1-way ANOVA, F1.05, 3.16 = 1.14, P = 0.37. (b) Percent of DMS A2A+ early action-initiation-inhibited neurons that were then also significantly inhibited by action initiation on the 4th (middle) and 8th (overtrain) training sessions. Approximately 27 – 41% of the early action-initiation inhibited ensemble continued to be inhibited around action initiation during the middle and overtraining phases of training. The proportion preserved did not change with training. 2-tailed t-test, t3 = 2.70, P =0.07, 95% CI −30.41 – 2.52. (c) Percent of DMS A2A+ overtrain action-initiation-inhibited neurons that were significantly modulated by action initiation on prior the 1st (early) and 4th (middle) training sessions. Approximately 36% of the overtraining action-initiation inhibited ensemble was also inhibited around action initiation during the preceding early and middle training phases. The proportion preserved did not change with training. 2-tailed t-test, t3 = 0.01, P =0.99, 95% CI −29.03 – 28.80. (d-g) Activity and modulation across training of DMS A2A+ early action-initiation-inhibited neurons. Heat map of minimum to maximum deconvolved activity (sorted by total activity) (d), Z-scored activity (e), and area under the receiver operating characteristic curve (auROC) modulation index (f) of these cells around action initiation across training. (g) auROC modulation index averaged across 0.5-s bins around action initiation. 2-way ANCOVA, Training: F1.38, 83.91 = 1.38, P = 0.26; Time bin: F2.48, 151.16 = 3.28, P = 0.02; Training × Time: F5.10, 311.30 = 3.80, P = 0.001. This early action-initiation inhibited ensemble became less inhibited by action initiation as training progressed. (h-k) Activity and modulation across training of coregistered DMS A2A+ overtrain early action-initiation-inhibited neurons. Heat map of minimum to maximum deconvolved activity (h), Z-scored activity (i), and area under the receiver operating characteristic curve (auROC) modulation index (j) of these cells around action initiation across training. (k) auROC modulation index averaged across 0.5-s bins around action initiation. 2-way ANCOVA, Training: F1.21, 53.15 = 5.67, P = 0.005; Time bin: F2.62, 115.18 = 2.16, P = 0.10; Training × Time: F4.48, 196.92 = 3.56, P = 0.006. This overtrain action-initiation inhibited ensemble became more inhibited prior to action initiation as training progressed. A2A-cre: N = 4 (1 male). Data presented as mean ± s.e.m. Males = solid lines, Females = dashed lines.
Extended Data Figure 5–8.
Extended Data Figure 5–8.. The ensembles of DMS A2A+ neurons encoding action termination and reward shift as habits form.
(a) Percent of all recorded coregistered DMS A2A+ neurons (Average 73.25 coregistered neurons/mouse, s.e.m. 17.09) significantly modulated (excited or inhibited) by action termination. Approximately 22 – 31% of DMS A2A+ neurons were modulated around action termination and this did not change with training. F1.00, 3.02 = 1.09, P = 0.37. (b) Percent of DMS A2A+ early action-termination excited neurons (N = 35 neurons/4 mice; average 8.75 neurons/mouse, s.e.m. 5.45) that were also significantly excited by action termination on the 4th (middle) and 8th (overtrain) training sessions. Only 12 – 22% of the early action-termination ensemble continued to be excited by action termination during the middle and overtraining phases of training. The proportion preserved did not change with training. Middle v. Overtrain t3 = 0.69, P =0.54. (c) Percent of DMS A2A+ overtrain action-termination excited (N = 13 neurons/4 mice; average 3.25 neurons/mouse, s.e.m. 1.18) neurons that were also significantly excited by action termination on the 1st (early) and 4th (middle) training sessions. Only 13 – 33% of the overtraining action-termination ensemble was also excited by action termination during the preceding early and middle training phases. The proportion preserved did not change with training. Early v. Middle t3 = 1.00, P =0.42. (d-g) Activity and modulation across training of DMS A2A+ early action-termination excited neurons. Heat map of minimum to maximum deconvolved activity (sorted by total activity) (d), Z-scored activity (e), and area under the receiver operating characteristic curve (auROC) modulation index (f) of these cells around action termination across training. (g) auROC modulation index averaged across 0.5-s bins around action termination. Training: F1.66, 54.75 = 3.77, P = 0.03; Time bin: F1.82, 60.19 = 1.17, P = 0.33; Training × Time: F4.19, 138.28 = 1.01, P = 0.41. This early action-termination ensemble became less modulated by action termination as training progressed. (h-k) Activity and modulation across training of DMS A2A+ overtrain action-termination excited neurons. Heat map of minimum to maximum deconvolved activity (h), Z-scored activity (i), and area under the receiver operating characteristic curve (auROC) modulation index (j) of these cells around action termination across training. (k) auROC modulation index averaged across 0.5-s bins around action termination. Training: F1.82, 19.98 = 3.20, P = 0.07; Time bin: F1.55, 17.10 = 2.28, P = 0.10; Training × Time: F3.25, 37.70 = 0.31, P = 0.83. This overtrain action-termination ensemble became slightly more modulated by action termination as training progressed. (l) Percent of coregistered DMS A2A+ neurons significantly modulated (excited or inhibited) by collection of the earned reward. Approximately 22 – 30% of DMS A2A+ neurons were modulated by reward and this did not significantly change with training. F1.32, 3.96 = 2.06, P = 0.23. (m) Percent of DMS A2A+ early reward excited neurons (N = 14 neurons/4 mice; average 3.50 neurons/mouse, s.e.m. 1.19) that were then significantly excited by reward on the 4th (middle) and 8th (overtrain) training sessions. Only 4 – 11% of the small early reward ensemble continued to be modulated by reward during the middle and overtraining phases of training. The proportion preserved did not change with training. Middle v. Overtrain 2-tailed Wilcoxon signed rank test, W = 1.00, P > 0.99. (n) Percent of DMS A2A+ overtrain reward excited neurons (N = 14 neurons/4 mice; average 3.50 neurons/mouse, s.e.m. 2.50) that were significantly excited by reward on the 1st (early) and 4th (middle) training sessions. Only 0 – 7% of the small overtraining reward ensemble was also modulated by reward during the preceding early and middle training phases. The proportion preserved did not change with training. Early v. Middle 2-tailed Wilcoxon signed rank test, W = −1.00, P > 0.99. (o-r) Activity and modulation across training of DMS A2A+ early reward excited neurons. Heat map of minimum to maximum deconvolved activity (o), Z-scored activity (p), and area under the receiver operating characteristic curve (auROC) modulation index (q) of these cells around reward across training. (r) auROC modulation index averaged across 0.5-s bins around reward. Training: F1.53, 18.36 = 0.492, P = 0.57; Time bin: F1.54, 18.49 = 0.64, P = 0.50; Training × Time: F2.58, 30.98 = 0.41, P = 0.71. (s-v) Activity and modulation across training of DMS A2A+ overtrain reward excited neurons. Heat map of minimum to maximum deconvolved activity (s), Z-scored activity (t), and area under the receiver operating characteristic curve (auROC) modulation index (u) of these cells around reward on across training. (v) auROC modulation index averaged across 0.5-s bins around reward. Training: F1.51, 18.10 = 1.07, P = 0.36; Time bin: F1.98, 23.76 = 2.78, P = 0.08; Training × Time: F2.94, 35.24 = 0.65, P = 0.58. A2A-cre: N = 4 (1 male). Data presented as mean ± s.e.m. Males = solid lines, Females = dashed lines.
Figure 5–9:
Figure 5–9:. The ensemble of DMS A2A+ neurons that encodes action initiation is more stable if subjects remain goal-directed with overtraining.
(a) Representative recorded A2A+ neuron spatial footprints during the first (early, left), 4th (middle), and 8th (overtrain) random-interval training session. Colored, co-registered neurons; white, non-co-registered neurons. (b) Heat map of minimum to maximum deconvolved activity (sorted by total activity) of each coregistered DMS A2A+ neuron around lever-press action initiation. (c) Percent of all A2A+ neurons coregistered across training in subjects that did not form habits with overtraining. 1-way ANOVA, F1.32, 3.95 = 2.82, P = 0.17. (d) Behavior class (initiating lever press, terminating press, reward collection, non-reinforced food-port check) decoding accuracy from A2A+ coregistered neuron activity compared to shuffled control. Line at 0.25 = chance. 3-way ANOVA, Neuron activity (v. shuffled): F1, 3 = 548.47, P < 0.001; Training session: F2, 6 = 4.11, P = 0.08; Behavior class: F3, 9 = 40.4, P < 0.001; Neuron activity × Training: F2, 6 = 5.95, P = 0.038; Neuron activity × Behavior Class: F3, 9 = 16.24, P < 0.001; Training × Behavior Class: F6, 18 = 2.55, P = 0.02; Neuron activity × Training × Behavior class: F6, 18 = 5.11, P = 0.003. (e) Lever-press rate decoding accuracy from A2A+ coregistered neuron activity. R = correlation coefficient between actual and decoded press rate. 2-way ANOVA, Neuron activity: F1, 3 = 195.5, P = 0.0008; Training: F1.64, 4.92 = 0.23, P = 0.77; Neuron activity × Training: F1.62, 4.86 = 0.23, P = 0.76. (f-h) Percent of coregistered neurons (Average 106.5 coregistered neurons/mouse, s.e.m. 40.34) significantly modulated (f; 1-way ANOVA, F1.05, 3.16 = 8.14, P = 0.06), excited (g; 1-way ANOVA, F1.14, 3.43 = 1.72, P = 0.28), or inhibited (h; 1-way ANOVA, F1.90, 5.70 = 3.37, P = 0.11) around action initiation. (i-k) Activity and modulation across training of DMS A2A+ early action-initiation excited neurons (N = 47 neurons/4 mice; average 9.4 neurons/mouse, s.e.m. = 12.44). Heat map (i), Z-scored activity (j), and area under the receiver operating characteristic curve (auROC) modulation index (k) of early action-initiation excited neurons around action initiation across training. (l) Modulation index averaged across 0.5-s bins around action initiation. 2-way ANCOVA, Training: F1.82, 64.00 = 0.48, P = 0.62; Time bin: F2.18, 76.27 = 0.07, P = 0.94; Training × Time: F3.78, 132.35 = 1.61, P = 0.18. (m-n) Cross-session correlation of the activity around action initiation of each early action-initiation excited neuron (m; 2-way ANCOVA, Neuron activity: F1, 35 = 0.02, P = 0.90; Training: F1, 35 = 0.001, P = 0.98; Activity × Time: F1, 35 = 0.01, P = 0.91) or the population activity of these neurons (n; 2-way ANOVA, Neuron activity: F1, 2 = 31.38, P = 0.03; Training: F1, 2 = 1.25, P = 0.38; Activity × Time: F1, 2 = 1.21, P = 0.39). (o) Percent of A2A+ early action-initiation excited neurons that continued to be significantly excited by action initiation on the 4th and 8th training sessions. 2-tailed t-test, t2 = 1.28, P =0.33, 95% CI −85.86 – 46.47. (p) Cross-session decoding accuracy of lever-press rate from the activity of A2A+ early action-initiation-excited neuron population activity on the 1st training session. Planned, Bonferroni corrected, 2-tailed t-tests, Early: t4 = 4.22, P = 0.04, 95% CI 0.03 – 0.85; Middle: t4 = 1.97, P = 0.36, 95% CI −0.21 – 0.61; Overtrain: t4 = 2.55, P = 0.19, 95% CI −0.15 – 0.67. (q-s) Activity and modulation across training of A2A+ overtrain action-initiation excited neurons (N = 49 neurons/4 mice; average 9.80 neurons/mouse, s.e.m. 11.69). Heat map (q), Z-scored activity (r), and auROC modulation index (s) of overtrain action-initiation excited neurons around action initiation across training. (t) Modulation index around action initiation. 2-way ANCOVA, Training: F1.65, 51.23 = 0.43, P = 0.43; Time bin: F2.16, 67.03 = 00.17, P = 0.86; Training × Time: F4.31, 133.64 = 0.59, P = 0.68. (u-v) Cross-session correlation of the activity around action initiation of each overtrain action-initiation excited neuron (u; 2-way ANCOVA, Neuron activity: F1, 31 = 1.77, P = 0.19; Training: F1, 31 = 0.28, P = 0.60; Activity × Time: F1, 31 = 0.20, P = 0.66) or the population activity of these neurons (v; 2-way ANOVA, Neuron activity: F1, 3 = 2.37, P = 0.22; Training: F1, 3 = 1.20, P = 0.35; Activity × Time: F1, 3 = 1.13, P = 0.37). (w) Percent of A2A+ overtrain action-initiation excited neurons that were also significantly excited by action initiation on 1st and 4th training sessions. 2-tailed t-test, t3 = 0.82, P =0.47, 95% CI −32.59 – 55.21. (x) Cross-session decoding accuracy of lever-press rate from the activity of A2A+ overtrain action-initiation-excited neuron population activity on the 8th training session. Planned, Bonferroni corrected, 2-tailed t-tests, Early: t6 = 1.48, P = 0.57, 95% CI −0.16 to 0.41; Middle: t6 = 0.92, P > 0.99, 95% CI −0.21 – 0.36; Overtrain: t6 = 4.73, P = 0.01, 95% CI 0.12 – 0.69. A2A-cre: N = 4 (3 male). Data presented as mean ± s.e.m. Males = closed circles/solid lines, Females = open circles/dashed lines. *P < 0.05, **P < 0.01, ***P < 0.001.
Extended Data Figure 6–1.
Extended Data Figure 6–1.. Food-port entries during training and test with DMS A2A+ neuron chemogenetic manipulation.
(a-c) Chemogenetic inactivation of DMS A2A+ neurons during learning. WT: N = 10 (1 male); A2A-cre: N = 7 (2 males). (a) Training entry rate. 2-way ANOVA, Training: F2.87, 43.12 = 3.50, P = 0.02; Genotype: F1, 15 = 6.63, P = 0.02; Training × Genotype: F4, 60 = 0.31, P = 0.87. (b) Test entry rate. 2-way ANOVA, Value × Genotype: F1, 15 = 1.08, P = 0.32; Value: F1, 15 = 4.06, P = 0.06; Genotype: F1, 15 = 0.03, P = 0.86. (c) Training average lever presses/earned reward outcome. 2-way ANOVA, Training: F2.38, 35.73 = 32.06, P < 0.0001; Genotype: F1, 15 = 0.08, P = 0.78; Training × Genotype: F4, 60 = 0.47, P = 0.76. (d-f) Chemogenetic activation of DMS A2A+ neurons during learning. WT: N = 11 (7 males); A2A-cre: N = 10 (6 males). (d) Training entry rate. 2-way ANOVA, Training: F4.22, 92.89 = 6.10, P = 0.0002; Genotype: F1, 22 = 0.00008, P = 0.98; Training × Genotype: F8, 176 = 0.54, P = 0.83. (e) Test entry rate. 2-way ANOVA, Value × Genotype: F1, 22 = 1.50, P = 0.23; Value: F1, 22 = 0.35, P = 0.56; Genotype: F1, 22 = 0.27, P = 0.61. (f) Training average lever presses/earned reward outcome. 2-way ANOVA, Training: F2.42, 5334 = 16.33, P < 0.0001; Genotype: F1, 22 = 0.35, P = 0.56; Training × Genotype: F8, 176 = 0.76, P = 0.64. (g-i) Chemogenetic inactivation of DMS A2A+ neurons during test of behavioral control after learning. WT: N = 16 (9 males); A2A-cre: N = 14 (8 males). (g) Training entry rate. 2-way ANOVA, Training: F1.93, 53.92 = 28.28, P < 0.0001; Genotype: F1, 28 = 0.23, P = 0.63; Training × Genotype: F4, 112 = 0.70, P = 0.59. (h) Test entry rate. 2-way ANOVA, Value × Genotype: F1, 28 = 0.47, P = 0.50; Value: F1, 28 = 1.82, P = 0. 19; Genotype: F1, 28 = 4.08, P = 0.05. (i) Training average lever presses/earned reward outcome. 2-way ANOVA, Training: F2.19, 61.42 = 81.93, P < 0.0001; Genotype: F1, 28 = 1.55, P = 0.22; Training × Genotype: F4, 112 = 2.25, P = 0.07. Data presented as mean ± s.e.m. Males = solid lines, Females = dashed lines. Neither chemogenetic inhibition nor activation of DMS A2A+ neurons affected checks of the food-delivery port or altered the press-reward action-outcome relationship.
Extended Data Figure 6–2.
Extended Data Figure 6–2.. Press rate during devaluation test with DMS A2A+ neuron inhibition.
(a) Test press rate. 2-way ANOVA, Value × Genotype: F1, 28 = 6.46, P = 0.02; Value: F1, 28 = 28.86, P < 0.001; Genotype: F1, 28 = 5.15, P = 0.03. (b) Devaluation index. 2-tailed t-test, t28 = 1.36; P = 0.18, 95% CI −0.03 – 0.16. Data presented as mean ± s.e.m. Males = closed circles/solid lines, Females = open circles/dashed lines. **P < 0.01, ***P < 0.001.
Figure 1:
Figure 1:. DMS D1+ neurons are modulated by actions and rewards during instrumental learning and overtraining.
(a) Representative cre-dependent jGCaMP7s expression in DMS D1+ neurons (right) and maximum intensity projection of the field-of-view for one session (left). (b) Map of DMS GRIN lens placements. (c) Representative dF/F and deconvolved calcium signal. (d) Procedure. RI, random-interval reinforcement schedule. (e) Training press rate. 1-way ANOVA, Training: F1.81, 5.43 = 4.84, P = 0.06. (f) Test press rate. 2-tailed t-test, t3 = 1.98, P = 0.14, 95% confidence interval (CI) −0.85 – 3.65. (g) Devaluation index [(Devalued presses)/(Valued presses + Devalued presses)]. One-tailed Bayes factor, BF10 = 0.18. (h-j) Activity of DMS D1+ neurons on the 1st (Early) session of RI training. (h) Percent of all recorded neurons (N = 870) significantly modulated by lever presses, food-delivery port checks, and reward. (i) Heat map of minimum to maximum deconvolved activity (sorted by total activity) of each D1+ neuron significantly modulated around lever-press action initiation (left), action termination (middle), or reward consumption (right). Above red line = excited above cutoff criterion, below = inhibited. (j) Z-scored activity of each population of modulated neurons. (k-m) Activity of DMS D1+ neurons on the 4th (Middle) training session. (k) Percent of all recorded neurons (N = 999) significantly modulated by lever presses, food-delivery port checks, and reward. (l) Heat map of each D1+ neuron significantly modulated around lever-press action initiation, action termination, or reward consumption. (m) Z-scored activity of each population of modulated neurons. (n-p) Activity of DMS D1+ neurons on the 8th (End, Overtrain) training session. (k) Percent of all recorded neurons (N = 994) significantly modulated by lever presses, food-delivery port checks, and reward. (l) Heat map of each DMS D1+ neuron significantly modulated around lever-press action initiation, action termination, or reward consumption. (m) Z-scored activity of each population of modulated neurons. D1-cre: N = 4 (1 male). Data presented as mean ± s.e.m. Males = closed circles/solid lines, Females = open circles/dashed lines.
Figure 2:
Figure 2:. An ensemble of DMS D1+ neurons stably encodes action initiation throughout action-outcome learning and habit formation.
(a) Representative recorded D1+ neuron spatial footprints during the first (early, left), 4th (middle), and 8th (overtrain) random-interval training sessions. Colored, co-registered neurons; white, non-co-registered neurons. (b) Heat map of minimum to maximum deconvolved activity (sorted by total activity) of each coregistered DMS D1+ neuron around lever-press action initiation. (c) Percent of all neurons coregistered across training. 1-way ANOVA, F1.19, 3.58 = 1.62, P = 0.29. (d) Behavior class (initiating lever press, terminating press, reward collection, non-reinforced food-port check) decoding accuracy from D1+ coregistered neuron activity compared to shuffled control. Line at 0.25 = chance. 3-way ANOVA, Neuron activity (v. shuffled): F1, 3 = 1092.83, P < 0.001; Training session: F1.17, 3.51 = 2.40, P = 0.21; Behavior class: F1.76, 5.28 = 15.56, P = 0.007; Neuron activity × Training: F1.24, 3.72= 4.60, P = 0.10; Neuron activity × Behavior Class: F1.81 , 5.43 = 6.49, P = 0.04; Training × Behavior Class: F2.15, 6.46 = 3.25, P = 0.10; Neuron activity × Training × Behavior class: F1.95, 5.86 = 0.69, P = 0.54. (e) Lever-press rate decoding accuracy from D1+ coregistered neuron activity. R = correlation coefficient between actual and decoded press rate. 2-way ANOVA, Neuron activity: F1, 3 = 11.46, P = 0.04; Training: F1.22, 3.66 = 0.07, P = 0.85; Neuron activity × Training: F1.22, 3.66 = 0.07, P = 0.85. (f-h) Percent of coregistered neurons (Average 137 coregistered neurons/mouse, s.e.m. 37.73) significantly modulated (f; 1-way ANOVA, F1.12, 3.37 = 0.06, P = 0.85), excited (g; 1-way ANOVA, F1.13, 3.63 = 0.14, P = 0.76), or inhibited (h; 1-way ANOVA, F1.29, 3.86 = 0.14, P = 0.79) around action initiation. (i-k) Activity and modulation across training of DMS D1+ early action-initiation excited neurons (N = 77 neurons/4 mice; average 19.25 neurons/mouse, s.e.m. = 5.36). Heat map (i), Z-scored activity (j), and area under the receiver operating characteristic curve (auROC) modulation index (k) of early action-initiation excited neurons around action initiation across training. (l-m) Cross-session correlation of the activity around action initiation of each early action-initiation excited neuron (l; 2-way ANCOVA, Neuron activity: F1, 75 = 22.92, P < 0.001; Training: F1, 75 = 0.40, P = 0.53; Activity × Time: F1, 75 = 0.24, P = 0.63) or the population activity of these neurons (m; 2-way ANOVA, Neuron activity: F1, 3 = 39.10, P = 0.008; Training: F1, 3 = 0.70, P = 0.18; Activity × Time: F1, 3 = 0.16, P = 0.72). (n) Cross-session decoding accuracy of lever-press rate from the activity of D1+ early action-initiation-excited neuron population activity on the 1st training session. Planned, Bonferroni corrected, 2-tailed t-tests, Early: t6 = 7.94, P = 0.0006, 95% CI 0.23 – 0.55; Middle: t6 = 4.83, P = 0.009, 95% CI 0.08 – 0.40; Overtrain: t6 = 4.68, P = 0.01, 95% CI 0.07 – 0.39. (o) Percent of D1+ early action-initiation excited neurons that continued to be significantly excited by action initiation on the 4th and 8th training sessions. 2-tailed Wilcoxon signed rank test, W = −1.00, P > 0.99. (p-r) Activity and modulation across training of D1+ overtrain action-initiation excited neurons (N = 76 neurons/4 mice; average 19 neurons/mouse, s.e.m. 5.49). Heat map (p), Z-scored activity (q), and auROC modulation index (r) of overtrain action-initiation excited neurons around action initiation across training. (s-t) Cross-session correlation of the activity around action initiation of each overtrain action-initiation excited neuron (s; 2-way ANCOVA, Neuron activity: F1, 74 = 21.06, P < 0.001; Training: F1, 74 = 0.23, P = 0.64; Activity × Time: F1, 74 = 0.26, P = 0.61) or the population activity of these neurons (t; 2-way ANOVA, Neuron activity: F1, 3 = 8.25, P = 0.06; Training: F1, 3 = 1.76, P = 0.28; Activity × Time: F1, 3 = 2.29, P = 0.23). (u) Cross-session decoding accuracy of lever-press rate from the activity of D1+ overtrain action-initiation-excited neuron population activity on the 8th training session. Planned, Bonferroni corrected, 2-tailed t-tests, Early: t6 = 3.64, P = 0.005, 95% CI 0.12 to 0.47; Middle: t6 = 6.48, P = 0.002, 95% CI 0.17 – 0.53; Overtrain: t6 = 9.49, P = 0.002, 95% CI 0.33 – 0.69. (v) Percent of D1+ overtrain action-initiation excited neurons that were also significantly excited by action initiation on 1st and 4th training sessions. 2-tailed t-test, t3 = 2.02, P =0.14, 95% CI −4.52 – 20.15. (w) Modulation around action initiation, termination, and earned reward across training of DMS D1+ stable action-initiation excited neurons (N = 31 neurons/4 mice; average 7.75 neurons/mouse, s.e.m. = 2.56). (x) Percent of coregistered D1+ neurons stably excited by action initiation across training that were also significantly modulated by action termination and reward collection. 2-way ANOVA, Event: F1, 11 = 7.81, P = 0.06; Training: F1.02, 3.05 = 4.37, P = 0.13; Event × Time: F1.86, 5.59 = 1.76, P = 0.25. D1-cre: N = 4 (1 male). Data presented as mean ± s.e.m. Males = closed circles/solid lines, Females = open circles/dashed lines. *P < 0.05, **P < 0.01, ***P < 0.001.
Figure 3:
Figure 3:. DMS D1+ neurons drive action-outcome learning and goal-directed decision making.
(a-f) Chemogenetic inactivation of DMS D1+ neurons during learning. WT: N = 7 (2 males); D1-cre: N = 9 (5 males). (a) Representative immunofluorescent image of cre-dependent hM4Di expression in DMS. (b) Map of DMS cre-dependent hM4Di expression for all subjects. (c) Procedure. RI, random-interval reinforcement schedule; CNO, clozapine-N-oxide. (d) Training press rate. 2-way ANOVA, Training: F2.22, 31.01 = 27.65, P < 0.0001; Genotype: F1, 14 = 1.27, P = 0.28; Training × Genotype: F4, 56 = 1.24, P = 0.30. (e) Test press rate. 2-way ANOVA, Value × Genotype: F1, 14 = 14.30, P = 0.002; Value: F1, 14 = 0.49, P = 0.49; Genotype: F1, 14 = 0.00005, P = 0.99. (f) Devaluation index [(Devalued condition presses)/(Valued condition presses + Devalued presses)]. 2-tailed t-test, t14 = 4.01; P = 0.001, 95% CI −0.16 - −0.05. (g-l) Chemogenetic activation of DMS D1+ neurons during overtraining. WT: N = 6 (4 males); D1-cre: N = 6 (3 males). (g) Representative immunofluorescent image of cre-dependent hM3Dq expression in DMS. (h) Map of DMS cre-dependent hM3Dq expression. (i) Procedure. (j) Training press rate. 2-way ANOVA, Training: F2.44, 24.43 = 3.23, P = 0.048; Genotype: F1, 10 = 0.31, P = 0.59; Training × Genotype: F8, 80 = 0.68, P = 0.71. (k) Test press rate. 2-way ANOVA, Value × Genotype: F1, 10 = 3.37, P = 0.10; Value: F1, 10 = 0.56, P = 0.47; Genotype: F1, 10 = 0.30, P = 0.59. (l) Devaluation index. 2-tailed t-test, t10 = 3.07; P = 0.01, 95% CI −0.45 - −0.07. (m-r) Chemogenetic inactivation of DMS D1+ neurons during test of behavioral control strategy after learning. WT: N = 12 (7 males); D1-cre: N = 12 (5 males). (m) Representative immunofluorescent image of cre-dependent hM4Di expression in DMS. (n) Map of DMS cre-dependent hM4Di expression. (o) Procedure. (p) Training press rate. 2-way ANOVA, Training: F1.59, 28.63 = 68.95, P < 0.0001; Genotype: F1, 18 = 0.74, P = 0.40; Training × Genotype: F4, 72 = 0.44, P = 0.78. (q) Test press rate. 2-way ANOVA, Value × Genotype: F1, 18 = 4.23, P = 0.05; Value: F1, 18 = 11.93, P = 0.003; Genotype: F1, 18 = 0.64, P = 0.44. (r) Devaluation index. 2-tailed t-test, t18 = 2.15; P = 0.045, 95% CI 0.003 – 0.20. Data presented as mean ± s.e.m. Males = closed circles/solid lines, Females = open circles/dashed lines. *P < 0.05, **P < 0.01, ***P < 0.001.
Figure 4:
Figure 4:. DMS A2A+ neurons are modulated by actions during instrumental learning and overtraining.
(a) Representative image of cre-dependent jGCaMP7s expression in DMS A2A+ neurons (right) and maximum intensity projection of the field-of-view for one session (left). (b) Map of DMS GRIN lens placements. (c) Representative dF/F and deconvolved calcium signal. (d) Procedure. RI, random-interval reinforcement schedule. (e) Training press rate. 2-way ANOVA, Training: F1.39, 4.18 = 3.56, P = 0.12. (f) Test press rate. 2-tailed t-test, t3 = 1.50, P = 0.23, 95% CI −1.86 – 5.16. (g) Devaluation index [(Devalued condition presses)/(Valued condition presses + Devalued presses)]. One-tailed Bayes factor, BF10 = 0.19. (h-j) Activity of DMS A2A+ neurons on the 1st (Early) session of RI training. (h) Percent of all recorded neurons (N = 671) significantly modulated by lever presses, food-delivery port checks, and reward. (i) Heat map of minimum to maximum deconvolved activity (sorted by total activity) of each D4–3MS A2A+ neuron significantly modulated around lever-press action initiation (right), action termination (middle), or reward consumption (left). Above red line = excited, below = inhibited. (j) Z-scored activity of each population of modulated neurons. (k-m) Activity of DMS A2A+ neurons on the 4th (Middle) training session. (k) Percent of all recorded neurons (N = 697) significantly modulated by lever presses, food-delivery port checks, and reward. (l) Heat map of minimum to maximum deconvolved activity of each DMS A2A+ neuron significantly modulated around lever-press action initiation, action termination, or reward. (m) Z-scored activity of each population of modulated neurons. (n-p) Activity of DMS A2A+ neurons on the 8th (End, Overtrain) training session. (k) Percent of all recorded neurons (N = 658) significantly modulated by lever presses, food-delivery port checks, and reward. (l) Heat map of minimum to maximum deconvolved activity of each DMS A2A+ neuron significantly modulated around lever-press action initiation, action termination, or reward. (m) Z-scored activity of each population of modulated neurons. Data presented as mean ± s.e.m. A2A-cre: N = 4 (1 male). Males = closed circles/solid lines, Females = open circles/dashed lines.
Figure 5:
Figure 5:. The ensemble of DMS A2A+ neurons that encodes action initiation shifts as habits form.
(a) Representative recorded A2A+ neuron spatial footprints during the first (early, left), 4th (middle), and 8th (overtrain) random-interval training session. Colored, co-registered neurons; white, non-co-registered neurons. (b) Heat map of minimum to maximum deconvolved activity (sorted by total activity) of each coregistered DMS A2A+ neuron around lever-press action initiation. (c) Percent of all neurons coregistered across training. 1-way ANOVA, F1.18, 3.54 = 0.78, P = 0.45. (d) Behavior class (initiating lever press, terminating press, reward collection, non-reinforced food-port check) decoding accuracy from A2A+ coregistered neuron activity compared to shuffled control. Line at 0.25 = chance. 3-way ANOVA, Neuron activity (v. shuffled): F1, 3 = 1570.57, P < 0.001; Training session: F2, 6 = 5.14, P = 0.05; Behavior class: F3, 9 = 23.33, P < 0.001; Neuron activity × Training: F2, 6 = 3.72, P = 0.09; Neuron activity × Behavior Class: F3, 9 = 17.65, P < 0.001; Training × Behavior Class: F6, 18 = 2.00, P = 0.12; Neuron activity × Training × Behavior class: F6, 18 = 1.66, P = 0.19. (e) Lever-press rate decoding accuracy from A2A+ coregistered neuron activity. R = correlation coefficient between actual and decoded press rate. 2-way ANOVA, Neuron activity: F1, 3 = 72.91, P = 0.003; Training: F1.33, 4.00 = 1.91, P = 0.25; Neuron activity × Training: F1.35, 4.04 = 1.74, P = 0.27. (f-h) Percent of coregistered neurons (Average 73.25 coregistered neurons/mouse, s.e.m. 17.09) significantly modulated (f; 1-way ANOVA, F1.02, 3.05 = 2.09, P = 0.24), excited (g; 1-way ANOVA, F1.16, 3.48 = 1.79, P = 0.27), or inhibited (h; 1-way ANOVA, F1.05, 3.16 = 1.14, P = 0.37) around action initiation. (i-k) Activity and modulation across training of DMS A2A+ early action-initiation excited neurons (N = 42 neurons/4 mice; average 16.8 neurons/mouse, s.e.m. = 5.61). Heat map (i), Z-scored activity (j), and area under the receiver operating characteristic curve (auROC) modulation index (k) of early action-initiation excited neurons around action initiation across training. (l-m) Cross-session correlation of the activity around action initiation of each early action-initiation excited neuron (l; 2-way ANCOVA, Neuron activity: F1, 38 = 7.59, P = 0.009; Training: F1, 38 = 1.12, P = 0.30; Activity × Time: F1, 38 = 1.04, P = 0.32) or the population activity of these neurons (m; 2-way ANOVA, Neuron activity: F1, 3 = 3.07, P = 0.18; Training: F1, 3 = 0.68, P = 0.47; Activity × Time: F1, 3 = 0.64, P = 0.48). (n) Cross-session decoding accuracy of lever-press rate from the activity of A2A+ early action-initiation-excited neuron population activity on the 1st training session. Planned, Bonferroni corrected, 2-tailed t-tests, Early: t6 = 2.94, P = 0.08, 95% CI −0.03 – 0.55; Middle: t6 = 2.92, P = 0.08, 95% CI −0.03 – 0.55; Overtrain: t6 = 0.37, P > 0.9999, 95% CI −0.26 – 0.33. (o) Percent of A2A+ early action-initiation excited neurons that continued to be significantly excited by action initiation on the 4th and 8th training sessions. 2-tailed t-test, t3 = 0.37, P =0.74, 95% CI −53.97 – 42.86. (p-r) Activity and modulation across training of A2A+ overtrain action-initiation excited neurons (N = 25 neurons/4 mice; average 6.25 neurons/mouse, s.e.m. 2.69). Heat map (p), Z-scored activity (q), and auROC modulation index (r) of overtrain action-initiation excited neurons around action initiation across training. (s-t) Cross-session correlation of the activity around action initiation of each overtrain action-initiation excited neuron (s; 2-way ANCOVA, Neuron activity: F1, 23 = 7.13, P = 0.01; Training: F1, 23 = 0.18, P = 0.68; Activity × Time: F1, 23 = 0.14, P = 0.72) or the population activity of these neurons (t; 2-way ANOVA, Neuron activity: F1, 3 = 2.01, P = 0.25; Training: F1, 3 = 0.20, P = 0.69; Activity × Time: F1, 3 = 0.31, P = 0.62). (u) Cross-session decoding accuracy of lever-press rate from the activity of A2A+ overtrain action-initiation-excited neuron population activity on the 8th training session. Planned, Bonferroni corrected, 2-tailed t-tests, Early: t6 = 0.61, P > 0.999, 95% CI −0.27 to 0.39; Middle: t6 = 2.63, P = 0.12, 95% CI −0.07 – 0.59; Overtrain: t6 = 2.86, P = 0.09, 95% CI −0.04 – 0.62. (v) Percent of A2A+ overtrain action-initiation excited neurons that were also significantly excited by action initiation on 1st and 4th training sessions. 2-tailed t-test, t3 = 0.13, P =0.31, 95% CI −18.88 – 17.44. (w) Percent of all coregistered DMS D1+ and A2A+ neurons that are significantly modulated by action initiation across all phases of training (‘stable action-initiation ensemble’). 2-tailed t-test, t3 = 3.55, P = 0.01, 95% CI −7.34 - −1.35. (x) Modulation across training of A2A+ early action-initiation inhibited neurons. (y) Accuracy with which lever-press rate can be can be decoded from the activity of A2A+ early action-initiation inhibited neurons. 2-way ANOVA, Neuron activity: F1, 3 = 7.44, P = 0.07; Training: F1.40, 4.21 = 0.17, P = 0.77; Neuron activity × Training: F1.41, 4.22 = 0.16, P = 0.79. (z) Modulation across training of A2A+ overtrain action-initiation inhibited neurons. (aa) Accuracy with which lever-press rate can be can be decoded from the activity of DMS A2A+ overtrain action-initiation inhibited neurons. 2-way ANOVA, Neuron activity: F1, 3 = 30.03, P = 0.01; Training: F1.52, 4.57 = 0.45, P = 0.61; Neuron activity × Training: F1.51, 4.54 = 0.48, P = 0.60. A2A-cre: N = 4 (1 male). Data presented as mean ± s.e.m. Males = closed circles/solid lines, Females = open circles/dashed lines. *P < 0.05, **P < 0.01.
Figure 6:
Figure 6:. DMS A2A+ neurons are necessary for action-outcome learning, but not goal-directed decision making.
(a-f) Chemogenetic inactivation of DMS A2A+ neurons during learning. WT: N = 10 (1 male); A2A-cre: N = 7 (2 males). (a) Representative immunofluorescent image of cre-dependent hM4Di expression in DMS. (b) Map of DMS cre-dependent hM4Di expression for all subjects. (c) Procedure. RI, random-interval reinforcement schedule; CNO, clozapine-N-oxide. (d) Training press rate. 2-way ANOVA, Training: F2.39, 35.84 = 30.54, P < 0.0001; Genotype: F1, 15 = 0.07, P = 0.79; Training × Genotype: F4, 60 = 0.05, P = 0.99. (e) Test press rate. 2-way ANOVA, Value × Genotype: F1, 15 = 9.67, P = 0.007; Value: F1, 15 = 0.02, P = 0.88; Genotype: F1, 15 = 0.96, P = 0.34. (f) Devaluation index [(Devalued condition presses)/(Valued condition presses + Devalued presses)]. 2-tailed t-test, t15 = 3.10; P = 0.007, 95% CI 0.057 – 0.31. (g-l) Chemogenetic activation of DMS A2A+ neurons during overtraining. WT: N = 12 (7 males); A2A-cre: N = 12 (6 males). (g) Representative immunofluorescent image of cre-dependent hM3Dq expression in DMS. (h) Map of DMS cre-dependent hM3Dq expression. (i) Procedure. (j) Training press rate. 2-way ANOVA, Training: F1.87, 41.22 = 15.50, P < 0.0001; Genotype: F1, 22 = 0.80, P = 0.38; Training × Genotype: F8, 176 = 0.25, P = 0.98. (k) Test press rate. 2-way ANOVA, Value × Genotype: F1, 22 = 0.03, P = 0.87; Value: F1, 22 = 0.20, P = 0.66; Genotype: F1, 22 = 1.30, P = 0.27. (l) Devaluation index. 2-tailed t-test, t22 = 0.89; P = 0.38, 95% CI −0.26 – 0.10. (m-r) Chemogenetic inactivation of DMS A2A+ neurons during test of behavioral control after learning. WT: N = 16 (9 males); A2A-cre: N = 14 (8 males). (m) Representative immunofluorescent image of cre-dependent hM4Di expression in DMS. (n) Map of DMS cre-dependent hM4Di expression. (o) Procedure. (p) Training press rate. 2-way ANOVA, Training: F1.78, 49.82 = 107.5, P < 0.0001; Genotype: F1, 28 = 1.30, P = 0.26; Training × Genotype: F4, 112 = 5.08, P = 0.008. (q) Test press rate normalized to pre-test training baseline. 2-way ANOVA, Value × Genotype: F1, 28 = 0.04, P = 0.84; Value: F1, 28 = 47.56, P < 0.0001; Genotype: F1, 28 = 0.50, P = 0.49. (r) Devaluation index. 2-tailed t-test, t28 = 0.22; P = 0.83, 95% CI −0.09 to 0.11. Data presented as mean ± s.e.m. Males = closed circles/solid lines, Females = open circles/dashed lines. *P < 0.05, **P < 0.01, ***P < 0.001

References

    1. Wassum K.M. Amygdala-cortical collaboration in reward learning and decision making. Elife 11 (2022). - PMC - PubMed
    1. Balleine B.W. The Meaning of Behavior: Discriminating Reflex and Volition in the Brain. Neuron 104, 47–62 (2019). - PubMed
    1. Balleine B.W. & Dickinson A. Goal-directed instrumental action: contingency and incentive learning and their cortical substrates. Neuropharmacology 37, 407–419 (1998). - PubMed
    1. O’Doherty J.P., Cockburn J. & Pauli W.M. Learning, Reward, and Decision Making. Annu Rev Psychol 68, 73–100 (2017). - PMC - PubMed
    1. Doll B.B., Simon D.A. & Daw N.D. The ubiquity of model-based reinforcement learning. Curr Opin Neurobiol 22, 1075–1081 (2012). - PMC - PubMed

Publication types

LinkOut - more resources