Learning to minimize efforts versus maximizing rewards: computational principles and neural correlates

Vasilisa Skvortsova¹, Stefano Palminteri², Mathias Pessiglione³

Affiliations

¹ Motivation, Brain and Behavior Laboratory, Neuroimaging Research Center, Brain and Spine Institute, INSERM U975, CNRS UMR 7225, UPMC-P6 UMR S 1127, 7561 Paris Cedex 13, France.
² Motivation, Brain and Behavior Laboratory, Neuroimaging Research Center, Brain and Spine Institute, INSERM U975, CNRS UMR 7225, UPMC-P6 UMR S 1127, 7561 Paris Cedex 13, France, Laboratoire de Neurosciences Cognitives, INSERM U960, and Département d'Etudes Cognitives, Ecole Normale Supérieure, 7505, Paris, France.
³ Motivation, Brain and Behavior Laboratory, Neuroimaging Research Center, Brain and Spine Institute, INSERM U975, CNRS UMR 7225, UPMC-P6 UMR S 1127, 7561 Paris Cedex 13, France, mathias.pessiglione@gmail.com.

PMID: 25411490
PMCID: PMC6608437
DOI: 10.1523/JNEUROSCI.1350-14.2014

Learning to minimize efforts versus maximizing rewards: computational principles and neural correlates

Vasilisa Skvortsova et al. J Neurosci. 2014.

. 2014 Nov 19;34(47):15621-30.

doi: 10.1523/JNEUROSCI.1350-14.2014.

Authors

Vasilisa Skvortsova¹, Stefano Palminteri², Mathias Pessiglione³

Affiliations

¹ Motivation, Brain and Behavior Laboratory, Neuroimaging Research Center, Brain and Spine Institute, INSERM U975, CNRS UMR 7225, UPMC-P6 UMR S 1127, 7561 Paris Cedex 13, France.
² Motivation, Brain and Behavior Laboratory, Neuroimaging Research Center, Brain and Spine Institute, INSERM U975, CNRS UMR 7225, UPMC-P6 UMR S 1127, 7561 Paris Cedex 13, France, Laboratoire de Neurosciences Cognitives, INSERM U960, and Département d'Etudes Cognitives, Ecole Normale Supérieure, 7505, Paris, France.
³ Motivation, Brain and Behavior Laboratory, Neuroimaging Research Center, Brain and Spine Institute, INSERM U975, CNRS UMR 7225, UPMC-P6 UMR S 1127, 7561 Paris Cedex 13, France, mathias.pessiglione@gmail.com.

PMID: 25411490
PMCID: PMC6608437
DOI: 10.1523/JNEUROSCI.1350-14.2014

Abstract

The mechanisms of reward maximization have been extensively studied at both the computational and neural levels. By contrast, little is known about how the brain learns to choose the options that minimize action cost. In principle, the brain could have evolved a general mechanism that applies the same learning rule to the different dimensions of choice options. To test this hypothesis, we scanned healthy human volunteers while they performed a probabilistic instrumental learning task that varied in both the physical effort and the monetary outcome associated with choice options. Behavioral data showed that the same computational rule, using prediction errors to update expectations, could account for both reward maximization and effort minimization. However, these learning-related variables were encoded in partially dissociable brain areas. In line with previous findings, the ventromedial prefrontal cortex was found to positively represent expected and actual rewards, regardless of effort. A separate network, encompassing the anterior insula, the dorsal anterior cingulate, and the posterior parietal cortex, correlated positively with expected and actual efforts. These findings suggest that the same computational rule is applied by distinct brain systems, depending on the choice dimension-cost or benefit-that has to be learned.

Keywords: computational modeling; effort; reinforcement learning; reward; ventromedial prefrontal cortex.

PubMed Disclaimer

Figures

**Figure 1.**
Behavioral task and results. A, Trial structure. Each trial started with a fixation cross followed by one of four abstract visual cues. The subject then had to make a choice by slightly squeezing the left or right hand grip. Each choice was associated with two outcomes: a monetary reward and a physical effort. Rewards were represented by a coin (10 or 20¢) that the subject received after exerting the required amount of effort, indicated by the height of the horizontal bar in the thermometer. The low and high bars corresponded respectively to 20% and 80% of a subject's maximal force. Effort could only start once the command GO! appeared on the screen. The subject had to squeeze the handgrip until the mercury reached the horizontal bar. In the illustrated example, the subject made a left-hand choice and produced an 80% force to win 20¢. The last screen informed the subject about the gain added to cumulative payoff. B, Probabilistic contingencies. There were four different contingency sets cued by four different symbols in the task. With cues A and B, reward probabilities (orange bars) differed between left and right (75%/25% and 25%/75%, respectively, chance of big reward), while effort probabilities (blue bars) were identical (100%/100% and 0%/0%, respectively, chance of big effort). The opposite was true for cues C and D: left and right options differed in effort probability (75%/25% and 25%/75%, respectively) but not in reward probability (100%/100% and 0%/0%, respectively). The illustration only applies to one task session. Contingencies were fully counterbalanced across the four sessions. C, Learning curves. Circles represent, trial by trial, the percentage of correct responses averaged across hands, sessions, and subjects for reward learning (left, cues A and B) and effort learning (right, cues C and D). Shaded intervals are intersubject SEM. Lines show the learning curves generated by the best computational model (QL with linear discount and different learning rates for reward and effort) identified by Bayesian model selection. D, Model fit. Scatter plots show intersubject correlations between estimated and observed responses for reward learning (left) and effort learning (right). Each dot represents one subject. Shaded areas indicate 95% confidence intervals on linear regression estimates.

**Figure 2.**
Neural underpinnings or effort and reward learning. A, B, Statistical parametric maps show brain regions where activity at cue onset significantly correlated with expected reward (A) and with the difference between expected effort and reward (B) in a random-effects group analysis (p < 0.05, FWE cluster corrected). Axial and sagittal slices were taken at global maxima of interest indicated by red pointers on glass brains, and were superimposed on structural scans. [*x y z*] coordinates of the maxima refer to the Montreal Neurological Institute space. Plots show regression estimates for reward (orange) and effort (blue) prediction and prediction errors in each ROI. No statistical test was performed on the β-estimates of predictions, as they served to identify the ROIs. p values were obtained using paired two-tailed t tests. Errors bars indicate intersubject SEM. ns, Nonsignificant.

See this image and copyright information in PMC

References

1. Bartra O, McGuire JT, Kable JW. The valuation system: a coordinate-based meta analysis of BOLD fMRI experiments examining neural correlates of subjective value. Neuroimage. 2013;76:412–427. doi: 10.1016/j.neuroimage.2013.02.063. - DOI - PMC - PubMed
1. Bautista LM, Tinbergen J, Kacelnik A. To walk or to fly? How birds choose among foraging modes. Proc Natl Acad Sci U S A. 2001;98:1089–1094. doi: 10.1073/pnas.98.3.1089. - DOI - PMC - PubMed
1. Behrens TE, Woolrich MW, Walton ME, Rushworth MF. Learning the value of information in an uncertain world. Nat Neurosci. 2007;10:1214–1221. doi: 10.1038/nn1954. - DOI - PubMed
1. Behrens TE, Hunt LT, Woolrich MW, Rushworth MF. Associative learning of social value. Nature. 2008;456:245–249. doi: 10.1038/nature07538. - DOI - PMC - PubMed
1. Berns GS, Laibson D, Loewenstein G. Intertemporal choice—toward an integrative framework. Trends Cogn Sci. 2007;11:482–488. doi: 10.1016/j.tics.2007.08.011. - DOI - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations
Medical
- ClinicalTrials.gov

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Learning to minimize efforts versus maximizing rewards: computational principles and neural correlates

Affiliations

Learning to minimize efforts versus maximizing rewards: computational principles and neural correlates

Authors

Affiliations

Abstract

Figures

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Other Literature Sources

Medical