. 2018 Oct 29;9(1):4503.

doi: 10.1038/s41467-018-06781-2.

Reference-point centering and range-adaptation enhance human reinforcement learning at the cost of irrational preferences

Sophie Bavard^{1

2

3}, Maël Lebreton^{4

5

6}, Mehdi Khamassi^{7

8}, Giorgio Coricelli^{9

10}, Stefano Palminteri^{11

12

13}

Affiliations

¹ Laboratoire de Neurosciences Cognitives Computationnelles, Institut National de la Santé et Recherche Médicale, 29 rue d'Ulm, 75005, Paris, France.
² Département d'Etudes Cognitives, Ecole Normale Supérieure, Paris, 75005, France.
³ Institut d'Etudes de la Cognition, Université de Paris Sciences et Lettres, Paris, 75005, France.
⁴ CREED lab, Amsterdam School of Economics, Faculty of Business and Economics, University of Amsterdam, Roetersstraat 11, Amsterdam, 1018 WB, The Netherlands.
⁵ Amsterdam Brain and Cognition, University of Amsterdam, Amsterdam, 1018 WB, The Netherlands.
⁶ Swiss Centre for Affective Sciences, University of Geneva, 24 rue du Général-Dufour, Geneva, 1205, Switzerland.
⁷ Institut des Systèmes Intelligents et Robotiques, Centre National de la Recherche Scientifique, 4 place Jussieu, 75005, Paris, France.
⁸ Institut des Sciences de l'Information et de leurs Interactions, Sorbonne Universités, 3 rue Michel-Ange, Paris, 75794, France.
⁹ Department of Economics, University of Southern California, Los Angeles, CA, 90007, USA.
¹⁰ Centro Mente e Cervello, Università di Trento, corso Bettini 21, Rovereto, 38068, Italy.
¹¹ Laboratoire de Neurosciences Cognitives Computationnelles, Institut National de la Santé et Recherche Médicale, 29 rue d'Ulm, 75005, Paris, France. stefano.palminteri@ens.fr.
¹² Département d'Etudes Cognitives, Ecole Normale Supérieure, Paris, 75005, France. stefano.palminteri@ens.fr.
¹³ Institut d'Etudes de la Cognition, Université de Paris Sciences et Lettres, Paris, 75005, France. stefano.palminteri@ens.fr.

PMID: 30374019
PMCID: PMC6206161
DOI: 10.1038/s41467-018-06781-2

Reference-point centering and range-adaptation enhance human reinforcement learning at the cost of irrational preferences

Sophie Bavard et al. Nat Commun. 2018.

. 2018 Oct 29;9(1):4503.

doi: 10.1038/s41467-018-06781-2.

Authors

Sophie Bavard^{1

2

3}, Maël Lebreton^{4

5

6}, Mehdi Khamassi^{7

8}, Giorgio Coricelli^{9

10}, Stefano Palminteri^{11

12

13}

Affiliations

¹ Laboratoire de Neurosciences Cognitives Computationnelles, Institut National de la Santé et Recherche Médicale, 29 rue d'Ulm, 75005, Paris, France.
² Département d'Etudes Cognitives, Ecole Normale Supérieure, Paris, 75005, France.
³ Institut d'Etudes de la Cognition, Université de Paris Sciences et Lettres, Paris, 75005, France.
⁴ CREED lab, Amsterdam School of Economics, Faculty of Business and Economics, University of Amsterdam, Roetersstraat 11, Amsterdam, 1018 WB, The Netherlands.
⁵ Amsterdam Brain and Cognition, University of Amsterdam, Amsterdam, 1018 WB, The Netherlands.
⁶ Swiss Centre for Affective Sciences, University of Geneva, 24 rue du Général-Dufour, Geneva, 1205, Switzerland.
⁷ Institut des Systèmes Intelligents et Robotiques, Centre National de la Recherche Scientifique, 4 place Jussieu, 75005, Paris, France.
⁸ Institut des Sciences de l'Information et de leurs Interactions, Sorbonne Universités, 3 rue Michel-Ange, Paris, 75794, France.
⁹ Department of Economics, University of Southern California, Los Angeles, CA, 90007, USA.
¹⁰ Centro Mente e Cervello, Università di Trento, corso Bettini 21, Rovereto, 38068, Italy.
¹¹ Laboratoire de Neurosciences Cognitives Computationnelles, Institut National de la Santé et Recherche Médicale, 29 rue d'Ulm, 75005, Paris, France. stefano.palminteri@ens.fr.
¹² Département d'Etudes Cognitives, Ecole Normale Supérieure, Paris, 75005, France. stefano.palminteri@ens.fr.
¹³ Institut d'Etudes de la Cognition, Université de Paris Sciences et Lettres, Paris, 75005, France. stefano.palminteri@ens.fr.

PMID: 30374019
PMCID: PMC6206161
DOI: 10.1038/s41467-018-06781-2

Abstract

In economics and perceptual decision-making contextual effects are well documented, where decision weights are adjusted as a function of the distribution of stimuli. Yet, in reinforcement learning literature whether and how contextual information pertaining to decision states is integrated in learning algorithms has received comparably little attention. Here, we investigate reinforcement learning behavior and its computational substrates in a task where we orthogonally manipulate outcome valence and magnitude, resulting in systematic variations in state-values. Model comparison indicates that subjects' behavior is best accounted for by an algorithm which includes both reference point-dependence and range-adaptation-two crucial features of state-dependent valuation. In addition, we find that state-dependent outcome valuation progressively emerges, is favored by increasing outcome information and correlated with explicit understanding of the task structure. Finally, our data clearly show that, while being locally adaptive (for instance in negative valence and small magnitude contexts), state-dependent valuation comes at the cost of seemingly irrational choices, when options are extrapolated out from their original contexts.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

**Fig. 1**
Experimental design and normalization process. a Learning task with four different contexts: reward/big, reward/small, loss/small, and loss/big. Each symbol is associated with a probability (P) of gaining or losing an amount of money or magnitude (M). M varies as a function of the choice contexts (reward seeking: +1.0€ or +0.1€; loss avoidance: −1.0€ or −0.1€; small magnitude: +0.1€ or −0.1€; big magnitude: +1.0€ or −1.0€). b The graph schematizes the transition from absolute value encoding (where values are negative in the loss avoidance contexts and smaller in the small magnitude contexts) to relative value encoding (complete adaptation as in the RELATIVE model), where favorable and unfavorable options have similar values in all contexts, thanks to both reference-point and range adaptation

**Fig. 2**
Behavioral results and model simulations. a Correct choice rate during the learning sessions. b Big magnitude contexts’ minus small magnitude contexts’ correct choice rate during the learning sessions. c and d Choice rate in the transfer test. Colored bars represent the actual data. Big black (RELATIVE), white (ABSOLUTE), and gray (HYBRID) dots represent the model-predicted choice rate. Small light gray dots above and below the bars represent individual subjects (N = 60). White stars indicate significant difference compared to zero. Error bars represent s.e.m. **P < 0.01, t-test. Green arrows indicate significant differences between actual and predicted choices at P < 0.001, t-test

**Fig. 3**
Transfer test behavioral results and model simulations. Colored map of pairwise choice rates during the transfer test for each symbol when compared to each of the seven other symbols, noted here generically as ‘option 1′ and ‘option 2′. Comparisons between the same symbols are undefined (black squares). a Experimental data, b ABSOLUTE model, c RELATIVE model, and d HYBRID model

**Fig. 4**
Computational properties and behavioral correlates of value normalization. a Likelihood difference (from model fitting) between the RELATIVE and the ABSOLUTE models over the 80 trials of the task sessions for both experiments (N = 60). A negative likelihood difference means that the ABSOLUTE model is the best-fitting model for the trial and a positive likelihood difference means that the RELATIVE model is the best-fitting model for the trial. Green dots: likelihood difference significantly different from 0 (P < 0.05, t-test). b Likelihood difference between the RELATIVE and the ABSOLUTE models over the first part of the task (40 first trials) and the last part (40 last trials) for both experiments. c Likelihood difference between the RELATIVE and the ABSOLUTE models for the two experiments. A negative likelihood difference means that the ABSOLUTE model is the best-fitting model for the experiment and a positive likelihood difference means that the RELATIVE model is the best-fitting model for the experiment. d Subject-specific free parameter weight (ω) comparison for the two experiments. e Subject-specific free parameter weight (ω) as a function of correct debriefing for the two questions (“fixed pairs” and “number of pairs”). f Debriefing as a function of the weight parameter. Small light gray dots above and below the bars in a–f represent individual subjects (N = 60). g and h Correct choice rate as a function of subjects’ weight parameter in the learning sessions and the transfer test for both Experiments 1 and 2. One dot corresponds to one participant (N = 60); green lines represent the linear regression calculations. Error bars represent s.e.m. ***P < 0.001, **P < 0.01, *P < 0.05, t-test

See this image and copyright information in PMC

Cited by

Decision neuroscience and neuroeconomics: Recent progress and ongoing challenges.
Dennison JB, Sazhin D, Smith DV. Dennison JB, et al. Wiley Interdiscip Rev Cogn Sci. 2022 May;13(3):e1589. doi: 10.1002/wcs.1589. Epub 2022 Feb 8. Wiley Interdiscip Rev Cogn Sci. 2022. PMID: 35137549 Free PMC article. Review.
Context-sensitive valuation and learning.
Hunter LE, Daw ND. Hunter LE, et al. Curr Opin Behav Sci. 2021 Oct;41:122-127. doi: 10.1016/j.cobeha.2021.05.001. Epub 2021 Jun 9. Curr Opin Behav Sci. 2021. PMID: 34222566 Free PMC article.
Social inequity disrupts reward-based learning.
Ham H, Jenkins AC. Ham H, et al. Commun Psychol. 2025 Aug 16;3(1):125. doi: 10.1038/s44271-025-00300-y. Commun Psychol. 2025. PMID: 40819098 Free PMC article.
Adaptation of utility functions to reward distribution in rhesus monkeys.
Bujold PM, Ferrari-Toniolo S, Schultz W. Bujold PM, et al. Cognition. 2021 Sep;214:104764. doi: 10.1016/j.cognition.2021.104764. Epub 2021 May 14. Cognition. 2021. PMID: 34000666 Free PMC article.
Contextual influence on confidence judgments in human reinforcement learning.
Lebreton M, Bacily K, Palminteri S, Engelmann JB. Lebreton M, et al. PLoS Comput Biol. 2019 Apr 8;15(4):e1006973. doi: 10.1371/journal.pcbi.1006973. eCollection 2019 Apr. PLoS Comput Biol. 2019. PMID: 30958826 Free PMC article.

See all "Cited by" articles

References

1. Guitart-Masip M, Duzel E, Dolan R, Dayan P. Action versus valence in decision making. Trends Cogn. Sci. 2014;18:194–202. doi: 10.1016/j.tics.2014.01.003. - DOI - PMC - PubMed
1. Knutson B, Katovich K, Suri G. Inferring affect from fMRI data. Trends Cogn. Sci. 2014;18:422–428. doi: 10.1016/j.tics.2014.04.006. - DOI - PubMed
1. Yechiam E, Hochman G. Losses as modulators of attention: review and analysis of the unique effects of losses over gains. Psychol. Bull. 2013;139:497–518. doi: 10.1037/a0029383. - DOI - PubMed
1. Sutton RS, Barto AG. Reinforcement learning: an introduction. IEEE Trans. Neural Netw. 1998;9:1054–1054. doi: 10.1109/TNN.1998.712192. - DOI
1. Rescorla RA, Wagner AR. A theory of Pavlovian conditioning: variations in the effectiveness of reinforcement and nonreinforcement. Class. Cond. II Curr. Res. Theory. 1972;2:64–99.

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

EP-D-15-015/EPA/EPA/United States

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Reference-point centering and range-adaptation enhance human reinforcement learning at the cost of irrational preferences

Affiliations

Reference-point centering and range-adaptation enhance human reinforcement learning at the cost of irrational preferences

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources