Additively Combining Utilities and Beliefs: Research Gaps and Algorithmic Developments

Anush Ghambaryan^{1

2}, Boris Gutkin^{1

2}, Vasily Klucharev¹, Etienne Koechlin²

Affiliations

¹ Centre for Cognition and Decision Making, HSE University, Moscow, Russia.
² Ecole Normale Supérieure, PSL Research University, Paris, France.

PMID: 34658760
PMCID: PMC8517513
DOI: 10.3389/fnins.2021.704728

Additively Combining Utilities and Beliefs: Research Gaps and Algorithmic Developments

Anush Ghambaryan et al. Front Neurosci. 2021.

. 2021 Oct 1:15:704728.

doi: 10.3389/fnins.2021.704728. eCollection 2021.

Authors

Anush Ghambaryan^{1

2}, Boris Gutkin^{1

2}, Vasily Klucharev¹, Etienne Koechlin²

Affiliations

¹ Centre for Cognition and Decision Making, HSE University, Moscow, Russia.
² Ecole Normale Supérieure, PSL Research University, Paris, France.

PMID: 34658760
PMCID: PMC8517513
DOI: 10.3389/fnins.2021.704728

Abstract

Value-based decision making in complex environments, such as those with uncertain and volatile mapping of reward probabilities onto options, may engender computational strategies that are not necessarily optimal in terms of normative frameworks but may ensure effective learning and behavioral flexibility in conditions of limited neural computational resources. In this article, we review a suboptimal strategy - additively combining reward magnitude and reward probability attributes of options for value-based decision making. In addition, we present computational intricacies of a recently developed model (named MIX model) representing an algorithmic implementation of the additive strategy in sequential decision-making with two options. We also discuss its opportunities; and conceptual, inferential, and generalization issues. Furthermore, we suggest future studies that will reveal the potential and serve the further development of the MIX model as a general model of value-based choice making.

Keywords: MIX model; additive strategy; normalized utility; one-armed bandit task; state belief; uncertain and volatile environment; value-based decision making.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

**FIGURE 1**
Probability of choosing an option according to classical economics view **(the leftmost panel)**, behavioral economics view **(the middle panel)**, and a recently developed MIX model **(the rightmost panel)**.

**FIGURE 2**
**(A)** Trial structure. In each trial, subjects see two forms (options), a diamond and a square, each proposing a reward in euros randomly chosen from the set {2, 4, 6, 8, and 10}. After making a choice, subjects only see the chosen option on the screen, followed by a display of the outcome of the choice in the center of the screen. The average duration of a trial was 4.15 s. After displaying two available options on the screen, subjects were given 1.5 s for thinking and responding by pressing one of two instructed buttons on the keyboard, left button for choosing the option on the left side of the screen and right button for choosing the option on the right side of the screen. The outcome of the trial was displayed 1.0 s. The delay of the outcome display was 0.1–0.2 s. The inter-trial delay was 0.4–0.6 s. **(B)** Experimental design. The outcome could be zero or equal to the proposed reward (shown on the first screen of each trial) with some probability that subjects were not informed about. However, they could derive the reward frequencies through experience. By experimental design, 20 and 80% reward frequencies were assigned to two options and switched between them after a random number of trials (16, 20, 24, or 28). Subjects were not informed about switches but could detect them throughout the experiment based on feedbacks (outcomes). Each subject went through 19 switches of reward frequencies, which divided the task into 20 episodes (a series of trials within which no change of reward frequencies occurs).

**FIGURE 3**
Scheme of the computational algorithm of the MIX model.

**FIGURE 4**
Research directions for the loss aversion as an exemplar behavioral variation compared with choices when outcomes are presented in gain domain. Directions (a) and (b) are presented in orange and blue, respectively.

See this image and copyright information in PMC

References

1. Acerbi L., Vijayakumar S., Wolpert D. M. (2014). On the origins of suboptimality in human probabilistic inference. PLoS Comput. Biol. 10:1003661. 10.1371/journal.pcbi.1003661 - DOI - PMC - PubMed
1. Behrens T. E. J., Woolrich M. W., Walton M. E., Rushworth M. F. S. (2007). Learning the value of information in an uncertain world. Nat. Neurosci. 10 1214–1221. 10.1038/nn1954 - DOI - PubMed
1. Blain B., Rutledge R. B. (2020). Momentary subjective well-being depends on learning and not reward. ELife 9 1–27. 10.7554/eLife.57977 - DOI - PMC - PubMed
1. Blankenstein N. E., van Duijvenvoorde A. C. K. (2019). Neural tracking of subjective value under riskand ambiguity in adolescence. Cogn. Affect. Behav. Neurosci. 19 1364–1378. 10.3758/s13415-019-00749-5 - DOI - PMC - PubMed
1. Blankenstein N. E., Peper J. S., Crone E. A., van Duijvenvoorde A. C. K. (2017). Neural mechanisms underlying risk and ambiguity attitudes. J. Cogn. Neurosci. 29 1845–1859. 10.1162/jocn_a_01162 - DOI - PubMed

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Additively Combining Utilities and Beliefs: Research Gaps and Algorithmic Developments

Affiliations

Additively Combining Utilities and Beliefs: Research Gaps and Algorithmic Developments

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

LinkOut - more resources

Full Text Sources