Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Jun 17;29(12):2066-2074.e5.
doi: 10.1016/j.cub.2019.05.013. Epub 2019 May 30.

An Analysis of Decision under Risk in Rats

Affiliations

An Analysis of Decision under Risk in Rats

Christine M Constantinople et al. Curr Biol. .

Abstract

In 1979, Daniel Kahneman and Amos Tversky published a ground-breaking paper titled "Prospect Theory: An Analysis of Decision under Risk," which presented a behavioral economic theory that accounted for the ways in which humans deviate from economists' normative workhorse model, Expected Utility Theory [1, 2]. For example, people exhibit probability distortion (they overweight low probabilities), loss aversion (losses loom larger than gains), and reference dependence (outcomes are evaluated as gains or losses relative to an internal reference point). We found that rats exhibited many of these same biases, using a task in which rats chose between guaranteed and probabilistic rewards. However, prospect theory assumes stable preferences in the absence of learning, an assumption at odds with alternative frameworks such as animal learning theory and reinforcement learning [3-7]. Rats also exhibited trial history effects, consistent with ongoing learning. A reinforcement learning model in which state-action values were updated by the subjective value of outcomes according to prospect theory reproduced rats' nonlinear utility and probability weighting functions and also captured trial-by-trial learning dynamics.

Keywords: computational model; decision-making; prospect theory; rat behavior; reinforcement learning; reward; subjective value.

PubMed Disclaimer

Conflict of interest statement

Declaration of Interests: The authors declare no competing interests.

Figures

Figure 1.
Figure 1.. Rats choose between guaranteed and probabilistic rewards.
(A) Behavioral task and timing of task events: flashes cue reward probability (p) and click rates convey water volume (x) on each side. Safe and risky sides are not fixed. (B) Relationship between cues and reward probability/volume in one task version. Alternative versions produced similar results (Figure S2). There were four possible volumes (6, 12, 24, or 48μL), and the risky side offered reward probabilities between 0 and 1 in increments of 0.1. (C) One rat’s performance for each of the safe side volumes. Axes are probability and volume of risky options. (D) A behavioral model inferred the utility and probability weighting functions that best explained rats’ choices. We modeled the probability that the rat chose the right side by a logistic function whose argument was the difference between the subjective value of each option (VR-VL) plus a trial history-dependent term. Subjective utility was parameterized as: u(x)={(xr)α if x>rκ(rx)α if x<r, where α is a free parameter, and x is reward volume. r is the reference point, which determines whether rewards are perceived as gains or losses. We first consider the case where r = 0, so u(x)=xα. The subjective probability of each option is computed by: w(p)=eβ(ln(p))δ, where β and δ are free parameters and p is the objective probability offered. Combining utility and probability yields the subjective value for each option: VR=u(xR)w(pR) VL=u(xL)w(pL). These were normalized by the max, and transformed into choice probabilities via a logistic function: P(ChooseR)=ι+12ι1+eλ(VRVL)+bias, where ι captures stimulus independent variability (lapse rate) and λ determines the sensitivity of choices to the difference in subjective value (VRVL). The bias term was comprised of three possible parameters, depending on trial history: bias={+/h1 if t-1 was safe L/R choice +/h2 if t-1 was risky L/R rew +/h3 if t-1 was risky L/R miss. (E) Model prediction for held-out data from one rat, averaged over 5 test sets. See also Figures S1, S2.
Figure 2.
Figure 2.. Non-parametric analyses confirm nonlinear utility and probability weighting and reveal diverse risk attitudes.
(A) Model fits of subjective utility functions for each rat, normalized by the maximum volume (48μL). (B) Schematic linear utility function: the perceptual distance (or discriminability, d’) between 0μL and 24μL is the same as 24μL and 48μL. (C) Schematic concave utility function: 24μL and 48μL are less discriminable than 0μL and 24μL. (D) One rat’s performance on trials with guaranteed outcomes of 0μL vs. 24μL (green), or 24μL vs. 48μL (purple). Performance ratio on these trials (“d’ ratio”) less than 1 indicates diminishing sensitivity. (E) The concavity of the utility function (α) is significantly correlated with reduced discriminability of larger rewards. Pink circle is rat from D. (F) Model fits of probability weighting functions. (G) Weights from logistic regression parameterizing each probability match probability weighting function for one rat. Error bars are s.e.m. for each regression coefficient. (H) Mean squared error between regression weights and parametric fits for each rat (mean mse=0.006, in units of probability). (I,J) To obtain “certainty equivalents,” we measured psychometric functions for each probability of receiving 48μL, and estimated the certain volume at which performance = 50%. (K) Measured (blue) and model-predicted (red) certainty equivalents from one rat indicates systematic undervaluing of the gamble, or risk aversion. Error bars for model-prediction are 95% confidence intervals of parameters from 5-fold cross validation. Data are mean +/− s.e.m. for left-out test sets. (L) Distribution of CE areas computed using analytic expression from model fits. Measured CEs were similar (Figure S3C). See also Figure S2, S3.
Figure 3.
Figure 3.. Rats exhibit evidence of trial-by-trial learning.
(A) Probability weighting function (left) and utility function (right) for one rat from model fit to trials following reward (turquoise) or no reward (black). (B) CE areas predicted from model fits for all rats following rewarded and unrewarded trials. (C) ΔProbability of repeating left/right choices (relative to mean probability of repeating), following each reward. Points above the dashed line indicate an increased probability of repeating (“stay”); those below indicate a decreased probability (“switch”). Black curve is average +/− s.e.m. across rats. (D) A separate cohort of 3 rats was trained with doubled water volumes. They exhibited lose-switch biases following 12 and 24μL. (E) Win-stay/lose-switch biases for one rat separated by reward history two trials back. (F) Schematic illustrating that with concave utility, rewards should be more (less) discriminable when the reference point is high (low). (G) Psychometric performance from one rat when the inferred reference point was low (black) or high (blue). Red curve is ideal performance. (H) Value function with the median parameters across rats indicates loss aversion (median α=0.6, κ=1.7). See also Figure S3.
Figure 4.
Figure 4.. Integrating Prospect Theory and reinforcement learning captures nonlinear subjective functions and learning.
(A) Prospect Theory model predictions for each rat, without the trial history parameters (h1-h3, see Figure 1 legend) does not account for win-stay/lose-switch trial history effects. Inclusion of these parameters accounts for these effects. (B) Prospect Theory model fit to simulated choices from a basic reinforcement learning agent yields linear utility and probability weighting functions over a range of generative learning rates (0.2, 0.4, 0.6, 0.8. 1.0, overlaid). (C) Schematic of model incorporating Prospect Theory and reinforcement learning. (D) The hybrid model described in panel C accounts for win-stay/lose-switch affects. (E) The model recovers nonlinear utility and probability weighting functions. (F) Model comparison when the error term used in the model was the subjective value (as shown in panel C), or the expected value (probability × reward). Red arrow is mean ΔAIC. (G) Binned values of rats’ lose-switch biases (measured from the data) plotted against the best-fit learning rate, αlearn. Pearson’s correlation coefficient is −0.37 across rats. See also Figure S4.

References

    1. Kahneman D, and Tversky A (1979). Prospect Theory: An Analysis of Decision under Risk. Econometrica 47, 263.
    1. Tversky A, and Kahneman D (1992). Advances in prospect theory: Cumulative representation of uncertainty. J. Risk Uncertain 5, 297–323.
    1. Bush RR, and Mosteller F A Mathematical Model for Simple Learning. Springer Series in Statistics, 221–234. Available at: 10.1007/978-0-387-44956-2_12. - DOI - PubMed
    1. Bush RR, and Mosteller F A Model for Stimulus Generalization and Discrimination. Springer Series in Statistics, 235–250. Available at: 10.1007/978-0-387-44956-2_13. - DOI - PubMed
    1. Rescorla RA, A.R. W (1972). A theory of Pavlovian conditioning: Variations in the effectiveness of reinforcement and nonreinforcement In Classical conditioning Ii: Current theory and research (New York: Appleton-Century-Crofts.).

Publication types

LinkOut - more resources