Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Oct 29;9(1):4503.
doi: 10.1038/s41467-018-06781-2.

Reference-point centering and range-adaptation enhance human reinforcement learning at the cost of irrational preferences

Affiliations

Reference-point centering and range-adaptation enhance human reinforcement learning at the cost of irrational preferences

Sophie Bavard et al. Nat Commun. .

Abstract

In economics and perceptual decision-making contextual effects are well documented, where decision weights are adjusted as a function of the distribution of stimuli. Yet, in reinforcement learning literature whether and how contextual information pertaining to decision states is integrated in learning algorithms has received comparably little attention. Here, we investigate reinforcement learning behavior and its computational substrates in a task where we orthogonally manipulate outcome valence and magnitude, resulting in systematic variations in state-values. Model comparison indicates that subjects' behavior is best accounted for by an algorithm which includes both reference point-dependence and range-adaptation-two crucial features of state-dependent valuation. In addition, we find that state-dependent outcome valuation progressively emerges, is favored by increasing outcome information and correlated with explicit understanding of the task structure. Finally, our data clearly show that, while being locally adaptive (for instance in negative valence and small magnitude contexts), state-dependent valuation comes at the cost of seemingly irrational choices, when options are extrapolated out from their original contexts.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
Experimental design and normalization process. a Learning task with four different contexts: reward/big, reward/small, loss/small, and loss/big. Each symbol is associated with a probability (P) of gaining or losing an amount of money or magnitude (M). M varies as a function of the choice contexts (reward seeking: +1.0€ or +0.1€; loss avoidance: −1.0€ or −0.1€; small magnitude: +0.1€ or −0.1€; big magnitude: +1.0€ or −1.0€). b The graph schematizes the transition from absolute value encoding (where values are negative in the loss avoidance contexts and smaller in the small magnitude contexts) to relative value encoding (complete adaptation as in the RELATIVE model), where favorable and unfavorable options have similar values in all contexts, thanks to both reference-point and range adaptation
Fig. 2
Fig. 2
Behavioral results and model simulations. a Correct choice rate during the learning sessions. b Big magnitude contexts’ minus small magnitude contexts’ correct choice rate during the learning sessions. c and d Choice rate in the transfer test. Colored bars represent the actual data. Big black (RELATIVE), white (ABSOLUTE), and gray (HYBRID) dots represent the model-predicted choice rate. Small light gray dots above and below the bars represent individual subjects (N = 60). White stars indicate significant difference compared to zero. Error bars represent s.e.m. **P < 0.01, t-test. Green arrows indicate significant differences between actual and predicted choices at P < 0.001, t-test
Fig. 3
Fig. 3
Transfer test behavioral results and model simulations. Colored map of pairwise choice rates during the transfer test for each symbol when compared to each of the seven other symbols, noted here generically as ‘option 1′ and ‘option 2′. Comparisons between the same symbols are undefined (black squares). a Experimental data, b ABSOLUTE model, c RELATIVE model, and d HYBRID model
Fig. 4
Fig. 4
Computational properties and behavioral correlates of value normalization. a Likelihood difference (from model fitting) between the RELATIVE and the ABSOLUTE models over the 80 trials of the task sessions for both experiments (N = 60). A negative likelihood difference means that the ABSOLUTE model is the best-fitting model for the trial and a positive likelihood difference means that the RELATIVE model is the best-fitting model for the trial. Green dots: likelihood difference significantly different from 0 (P < 0.05, t-test). b Likelihood difference between the RELATIVE and the ABSOLUTE models over the first part of the task (40 first trials) and the last part (40 last trials) for both experiments. c Likelihood difference between the RELATIVE and the ABSOLUTE models for the two experiments. A negative likelihood difference means that the ABSOLUTE model is the best-fitting model for the experiment and a positive likelihood difference means that the RELATIVE model is the best-fitting model for the experiment. d Subject-specific free parameter weight (ω) comparison for the two experiments. e Subject-specific free parameter weight (ω) as a function of correct debriefing for the two questions (“fixed pairs” and “number of pairs”). f Debriefing as a function of the weight parameter. Small light gray dots above and below the bars in af represent individual subjects (N = 60). g and h Correct choice rate as a function of subjects’ weight parameter in the learning sessions and the transfer test for both Experiments 1 and 2. One dot corresponds to one participant (N = 60); green lines represent the linear regression calculations. Error bars represent s.e.m. ***P < 0.001, **P < 0.01, *P < 0.05, t-test

Similar articles

Cited by

References

    1. Guitart-Masip M, Duzel E, Dolan R, Dayan P. Action versus valence in decision making. Trends Cogn. Sci. 2014;18:194–202. doi: 10.1016/j.tics.2014.01.003. - DOI - PMC - PubMed
    1. Knutson B, Katovich K, Suri G. Inferring affect from fMRI data. Trends Cogn. Sci. 2014;18:422–428. doi: 10.1016/j.tics.2014.04.006. - DOI - PubMed
    1. Yechiam E, Hochman G. Losses as modulators of attention: review and analysis of the unique effects of losses over gains. Psychol. Bull. 2013;139:497–518. doi: 10.1037/a0029383. - DOI - PubMed
    1. Sutton RS, Barto AG. Reinforcement learning: an introduction. IEEE Trans. Neural Netw. 1998;9:1054–1054. doi: 10.1109/TNN.1998.712192. - DOI
    1. Rescorla RA, Wagner AR. A theory of Pavlovian conditioning: variations in the effectiveness of reinforcement and nonreinforcement. Class. Cond. II Curr. Res. Theory. 1972;2:64–99.

Publication types