Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Feb 7:13:50.
doi: 10.3389/fnins.2019.00050. eCollection 2019.

Behavioral Paradigms to Probe Individual Mouse Differences in Value-Based Decision Making

Affiliations

Behavioral Paradigms to Probe Individual Mouse Differences in Value-Based Decision Making

Opeyemi O Alabi et al. Front Neurosci. .

Abstract

Value-based decision making relies on distributed neural systems that weigh the benefits of actions against the cost required to obtain a given outcome. Perturbations of these systems are thought to underlie abnormalities in action selection seen across many neuropsychiatric disorders. Genetic tools in mice provide a promising opportunity to explore the cellular components of these systems and their molecular foundations. However, few tasks have been designed that robustly characterize how individual mice integrate differential reward benefits and cost in their selection of actions. Here we present a forced-choice, two-alternative task in which each option is associated with a specific reward outcome, and unique operant contingency. We employed global and individual trial measures to assess the choice patterns and behavioral flexibility of mice in response to differing "choice benefits" (modeled as varying reward magnitude ratios) and different modalities of "choice cost" (modeled as either increasing repetitive motor output to obtain reward or increased delay to reward delivery). We demonstrate that (1) mouse choice is highly sensitive to the relative benefit of outcomes; (2) choice costs are heavily discounted in environments with large discrepancies in relative reward; (3) divergent cost modalities are differentially integrated into action selection; (4) individual mouse sensitivity to reward benefit is correlated with sensitivity to reward costs. These paradigms reveal stable individual animal differences in value-based action selection, thereby providing a foundation for interrogating the neural circuit and molecular pathophysiology of goal-directed dysfunction.

Keywords: cost-benefit; decision-making; economic choice; flexibility; mouse; operant behavior; value.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Acquisition of value-based choice paradigm is accompanied by dynamic changes in reinforcement. (A) Schematic of trial structure showing that mice initiate trials via sustained magazine entry, respond via lever press during specified temporal window, and collect rewards from center magazine. (B) Block structure—trials with the same contingency occur consecutively until mice select the alternative with the large reward eight times in a proximal window of 10 trials. Dotted lines signify block switch and gray box denotes 10-trial moving window for triggering contingency switch. (C) Mice that were trained in a simple lever press-reward contingency were initiated into the reversal paradigm at one of three probabilities of reinforcement. On any given trial, one alternative resulted in reward and the other resulted in no reward (Prew = 1, n = 5; Prew = 0.7, n = 11; Prew = 0.4, n = 5). (D) Block length, the average number of trials until a contingency switch, decreased over the duration of the training period in a reward probability dependent fashion. (E) The overall probability that mice select the large reward increases over the duration of training in a reward probability dependent fashion. (F) Logistic regression modeled the effects of past reinforcers on subsequent choice in early (<1,000 trials, top) and late (>2,000 trials, bottom) periods of acquisition. We performed multiple t-tests comparing “Large Reward” and “No Reward” coefficients to assess which reward outcome types were reinforcing (significance indicated by asterisk, corrected for multiple comparisons using Holm-Sidak). In early acquisition, only the T−1 trial at Prew = 1 was positively reinforcing relative to no reward outcome. In later acquisition, the T−1 trial was significantly reinforcing for all probabilities tested. (G) The probability that animals stayed on a choice alternative after receiving a large reward (Pr(Reward)-Stay) increased over the course of training in a reward probability dependent manner. (H) There was a significant effect of acquisition day on the probability that animals stayed on a choice alternative after receiving no reward (Pr(No Reward)-Stay), however, pairwise comparisons revealed few differences in these values and we noted no consistent differences over the course of multiple days. (I) The relative action value, defined as the reinforcing property of large reward vs. no reward outcomes, increased as animals gain experience in this paradigm in a reward probability dependent fashion. All data analyzed by Repeated Measures (Day) Two-Way ANOVA.
Figure 2
Figure 2
Acquisition of value-based choice paradigm is accompanied by dynamic changes in motor-efficiency and behavioral flexibility: (A) Choice patterns in the absence of benefit differences show that some mice continually sample the two available options while others develop significant choice bias. (B,C) As animal acquire the reversal task, they display increased motor efficiency in the execution of the task, including the speed with which choices are made [latency to choice (B)] and the speed with which trials are initiated [latency to initiate (C)]. (D) Adaptability, a measure of choice flexibility, shows a probability-dependent increase with training. (E) Both the relative action value (left) and the adaptability index (right) have significant linear relationships with overall task performance in an individual session (data from Days 8,9,10 of n = 11 mice at Prew = 0.7) (F) Bivariate linear regression analysis of session performance against RAV and the Adaptability Index indicates that both of these variables are significant, with minor multicollinearity (VIF = 1.76). Together, they account for 83% of variability in session performance of mice.
Figure 3
Figure 3
Choice is strongly shaped by differentially rewarded outcomes. (A) Animals (n = 21) were tested at three reward magnitude contrasts and either high or low reinforcement probability regimes. (B,C) The average block length and the probability that animals select the large reward alternative over the course of a session are both sensitive to the relative contrast in reward as well as the probability of reinforcement. (D) While there was a significant interaction between reward contrast and probability for Large-Reward-Stay behavior, Tukey's multiple comparisons test revealed no pairwise differences for relative rewards within individual probabilities of reinforcement. (E) There was a significant main effect of reward contrast on Small-Reward-Stay behavior. (F) The relative action value exhibited significant main effects for both relative reward contrast and probability of reinforcement. (G) The latency to initiate trials, averaged by session, showed a significant effect for reward contrast and an interaction between reward contrast and probability. Pairwise differences obtained by Tukey's multiple comparison's test indicate no significant differences in the initiation times for mice at the higher probability of reinforcement. (H) The relative initiation latency demonstrates that at Prew = 0.7, mice more rapidly initiate trials following large than small rewards. This disparity is sensitive to relative reward magnitude contrast. (I) Behavioral flexibility after contingency switches is sensitive to relative reward magnitude contrasts as well as the probability of reinforcement. All data analyzed by Repeated Measures (Reward Ratio) Two-Way ANOVA.
Figure 4
Figure 4
Choice is largely influenced by the T−1 trial in a relative reward environment. (A–F) The average coefficients for the multivariate logistic regression model that describes previous choice and reward history. (Prew = 0.7, n = 11; Prew = 0.4; n = 10) For each of the relative reward regimes tested, note that the T−1 trial is significant for large reward outcomes. For relative reward regimes with 10 μL (A,D) and 5 μL (B,E) small reward outcomes we note a significant T−1 coefficient. We detect no significant T−2 (or further) coefficients indicating that mouse choice is largely dictated by the outcome of the T−1 trial.
Figure 5
Figure 5
Q-learning reinforcement learning model predicts choice behavior. In order to extract information on how mice update action values using reward prediction errors (α) and how sensitive mouse choice is to differences in action values (β), a reinforcement learning model was fit to the choice data of mice performing the relative reward paradigm. (A) Table summarizing model parameters in both reward probability environments. (B) The calculated Q-values of the two levers for an individual mouse in a single session [15 vs. 0 μL at Prew = 0.7]. As the animal performs reversals, we note an oscillation in the action value for both options. (C) The probability that the mouse selected alternative A on a given trial vs. predicted probabilities generated by the full model. This model is a significant predictor of choice behavior (R2 = 0.43). Q-values and choice probabilities are calculated as a moving average of nine trials.
Figure 6
Figure 6
“Trait-like” stability of reward sensitivity and flexibility measures. (A,B) Cross-session correlation of relative action value and adaptability index revealed a significant positive linear relationship between the values of mice relative to the population on Day 1 and the values of mice relative to the population on Days 2/3 (averaged) for both reward sensitivity and behavioral flexibility. (C,D) Cross-contingency correlations of relative action value and adaptability whereby z-scored values for RAV and adaptability index in the large disparity (15 vs. 0 μl) reward environment were correlated with values from the 3x (top) and 1.5x (bottom) relative reward environments. (C) We noted a significant correlation between the RAV for the large and 3x reward ratio and (E) a trend in the correlation of the large and 1.5x reward ratios. (D,F) We note no cross-contingency correlation for the adaptability index.
Figure 7
Figure 7
Effort costs alter the reinforcing properties of small reward alternatives. (A) Mice (n = 19) were tested at two reward magnitude contrasts across three different operant schedule contrasts. (B,C) Both the relative reward contrast as well as the effort schedule had a significant effect on the block length and the overall rate of selecting the large reward as well as a significant interaction between the effects of effort and the relative reward on both measures. (D) The imposition of high-effort costs on the large reward alternative did not have statistically significant effects on the reinforcing properties of that alternative. (E) Increased operant scheduling on the large reward alternative had a statistically significant effect on the reinforcing properties of the small reward choice. (F) We observed a significant interaction between reward contrast and effort, with increased effort costs exerting more dynamic effects in the small reward contrast environment. (G) Increased effort to reward had a small but significant effect on behavioral flexibility. Pairwise analysis indicates that behavioral flexibility was actually increased with the application of increased operant scheduling. All data analyzed by Repeated Measures (Reward Ratio, Effort) Two-Way ANOVA.
Figure 8
Figure 8
Delay costs primarily alter the reinforcing properties of large reward alternatives. (A) Mice (n = 21) were tested at two reward magnitude contrasts across four delays to reward delivery, applied exclusively to the large reward option. (B,C) Both the relative reward contrast as well as the delay to reward had a significant effect on the average block length and the probability mice chose the large reward over the course of a session. We observed a significant interaction between the effects of delay and reward contrast for block length, but not Pr(Large Reward). (D) The application of delay costs to the large reward alternative had a significant effect on win-stay behavior following large reward outcomes, where we observed an interaction between delay and reward contrast. (E) The addition of delay to large reward outcomes had a statistically significant effect on the reinforcing properties of small reward outcomes. Nevertheless, Tukey's multiple comparisons revealed no pairwise differences between values at three of the four delay regimes. (F) The relative reinforcing properties of large and small reward outcomes is sensitive to reward magnitude contrast as well as increasing temporal delay to reward delivery. (G) Increased temporal delay to reward had a significant effect on flexibility, with adaptability generally being higher with larger reward contrasts. All data analyzed by Repeated Measures (Reward Ratio, Delay) Two-Way ANOVA.
Figure 9
Figure 9
Sensitivity to reward benefits and costs are correlated. (A,C) The baseline sensitivity of mice to reward benefits (measured as the relative action value of 15 v 5 μL in FR2 v FR2 for effort (A, n = 19) and 0 s v 0 s for delay (C, n = 21) was correlated with the averaged relative action values measured upon addition of effort and delay costs. We found a significant correlation between the sensitivity of animals to reward benefits with and without the addition of associated costs, consistent with “trait-like” expression of reward sensitivity. (B,D) To quantify the extent to which each cost modality altered mouse choice distributions we took the difference in relative action value of mice in baseline conditions and with application of operant costs (RAVcost -RAVbaseline). Increasing negative values indicate larger choice disruption in the presence of costs. We observed a significant relationship in the sensitivity of mice to reward benefits and the sensitivity of mice to the addition of reward costs, relative to the population.
Figure 10
Figure 10
Cost-benefit correlations with small reward contrats. (A,C) The sensitivity of mice to reward benefits was measured as the relative action value in baseline conditions for the effort (A, FR2 v FR2, n = 19) and delay (C, 0 s v 0 s, n = 21) experiments (Reward Contrast: 15 v 10 μL). These values were correlated with the averaged relative action values measured in mice with the addition of effort and delay costs. (A) At the low discrepancy in reward magnitude, there is a significant correlation between the sensitivity of animals to reward benefits, in environments with and without the addition of increased operant scheduling. (C) We note no cross-session correlation in reward sensitivity with the application of temporal delay costs in a reward environment with a small discrepancy in reward benefit. (B,D) We note a significant relationship in the sensitivity of mice to reward benefits and the sensitivity of mice to the addition temporal delay (D) but not effort costs, relative to the population, in this reward environment. (E) No correlation exists between the sensitivity of mice to the two cost modalities tested.

Similar articles

Cited by

References

    1. Albrecht M. A., Waltz J. A., Frank M. J., Gold J. M. (2016). Probability and magnitude evaluation in schizophrenia. Schizophr. Res. Cogn. 5, 41–46. 10.1016/j.scog.2016.06.003 - DOI - PMC - PubMed
    1. Allen W. E., Kauvar I. V., Chen M. Z., Richman E. B., Yang S. J., Chan K., et al. . (2017). Global representations of goal-directed behavior in distinct cell types of mouse neocortex. Neuron 94, 891–907.e6. 10.1016/j.neuron.2017.04.017 - DOI - PMC - PubMed
    1. Baltz E. T., Yalcinbas E. A., Renteria R., Gremel C. M. (2018). Orbital frontal cortex updates state-induced value change for decision-making. eLife 7:e35988. 10.7554/eLife.35988 - DOI - PMC - PubMed
    1. Boomhower S. R., Newland M. C. (2016). Adolescent methylmercury exposure affects choice and delay discounting in mice. Neurotoxicology 57, 136–144. 10.1016/j.neuro.2016.09.016 - DOI - PMC - PubMed
    1. Cai X., Kim S., Lee D. (2011). Heterogeneous coding of temporally discounted values in the dorsal and ventral striatum during intertemporal choice. Neuron 69, 170–182. 10.1016/j.neuron.2010.11.041 - DOI - PMC - PubMed

LinkOut - more resources