. 2023 Oct 20;9(42):eadi2704.

doi: 10.1126/sciadv.adi2704. Epub 2023 Oct 20.

Computational mechanisms underlying latent value updating of unchosen actions

Ido Ben-Artzi^{1

2

3}, Yoav Kessler⁴, Bruno Nicenboim⁵, Nitzan Shahar^{1

2}

Affiliations

¹ School of Psychological Sciences, Tel Aviv University, Tel Aviv, Israel.
² Sagol School of Neuroscience, Tel Aviv University, Tel Aviv, Israel.
³ Minducate Science of Learning Research and Innovation Center of the Sagol School of Neuroscience, Tel Aviv University, Tel Aviv, Israel.
⁴ Department of Psychology and School of Brain Sciences and Cognition, Ben Gurion University of the Negev, Be'er Sheva, Israel.
⁵ Department of Cognitive Science and Artificial Intelligence, Tilburg University, Tilburg, Netherlands.

PMID: 37862419
PMCID: PMC10588947
DOI: 10.1126/sciadv.adi2704

Computational mechanisms underlying latent value updating of unchosen actions

Ido Ben-Artzi et al. Sci Adv. 2023.

. 2023 Oct 20;9(42):eadi2704.

doi: 10.1126/sciadv.adi2704. Epub 2023 Oct 20.

Authors

Ido Ben-Artzi^{1

2

3}, Yoav Kessler⁴, Bruno Nicenboim⁵, Nitzan Shahar^{1

2}

Affiliations

¹ School of Psychological Sciences, Tel Aviv University, Tel Aviv, Israel.
² Sagol School of Neuroscience, Tel Aviv University, Tel Aviv, Israel.
³ Minducate Science of Learning Research and Innovation Center of the Sagol School of Neuroscience, Tel Aviv University, Tel Aviv, Israel.
⁴ Department of Psychology and School of Brain Sciences and Cognition, Ben Gurion University of the Negev, Be'er Sheva, Israel.
⁵ Department of Cognitive Science and Artificial Intelligence, Tilburg University, Tilburg, Netherlands.

PMID: 37862419
PMCID: PMC10588947
DOI: 10.1126/sciadv.adi2704

Abstract

Current studies suggest that individuals estimate the value of their choices based on observed feedback. Here, we ask whether individuals also update the value of their unchosen actions, even when the associated feedback remains unknown. One hundred seventy-eight individuals completed a multi-armed bandit task, making choices to gain rewards. We found robust evidence suggesting latent value updating of unchosen actions based on the chosen action's outcome. Computational modeling results suggested that this effect is mainly explained by a value updating mechanism whereby individuals integrate the outcome history for choosing an option with that of rejecting the alternative. Properties of the deliberation (i.e., duration/difficulty) did not moderate the latent value updating of unchosen actions, suggesting that memory traces generated during deliberation might take a smaller role in this specific phenomenon than previously thought. We discuss the mechanisms facilitating credit assignment to unchosen actions and their implications for human decision-making.

PubMed Disclaimer

Figures

**Fig. 1.. Inverse value updating of unchosen actions.**
(A) Illustration of a trial sequence. Participants completed a four-armed bandit task. In each trial, two cards (of four) were randomly offered by the computer for participants’ selection. We examined trials where the unchosen card in trial n was reoffered at trial n + 1 with a card that was not offered on trial n. This allowed us to examine whether the outcome associated with the chosen card in trial n influenced the probability that the participant will select the previously unchosen card at trial *n +* 1. For example, as illustrated in this panel, we ask whether the reward delivered at trial n (as a result of choosing the dark card) influenced the probability of selecting the unchosen card (orange) when offered with a third card (blue). (B) Card selection led to a binary outcome determined by slowly drifting probabilities. We used randomly drifting reward probabilities to ensure continued learning. The reward probabilities of each card were independent (mean shared variance = 5.3%). (C) Probability of choosing a previously unchosen action as a function of outcome in the previous trial. Results indicated that the probability of choosing a previously unchosen card was reduced after rewarded trials compared to unrewarded trials. This was true for both win blocks (where outcomes included winning/not winning a play pound coin) and loss blocks (where outcomes included not winning/losing a play pound coin). (D) The posterior distributions for the influence of previous outcome (top) and the interaction with condition (bottom) on choosing the previously unchosen card in a logistic regression (the blue dotted line indicates the null point, and the gray horizontal line indicates HDI_95%). Overall results indicate an inverted influence of the previous outcome on the chances of selecting an unchosen action, regardless of win/loss conditions.

**Fig. 2.. Moderation of deliberation duration and difficulty on inverse value updating of unchosen actions.**
(A) Hierarchical Bayesian logistic regression showed no moderating effect for the absolute difference in expected values between the two offered cards on the tendency to assign value to an unchosen action. (B) Higher RTs were assumed to be indicative of increased deliberation but had no moderation effect on the tendency to assign value for unchosen actions. (C) Posterior distribution showing no evidence supporting the moderation of value updating for unchosen actions by deliberation difficulty (blue line indicating the null point; probability of direction = 65%; gray line indicating HDI_95%). (D) Posterior distribution depicting a lack of evidence for the interaction between RT and previous outcome on the tendency to choose a previously unselected card (blue line indicating the null point; probability of direction = 55%; gray line indicates HDI_95%).

**Fig. 3.. Parameter recovery.**
The top rows in each panel present population parameter recovery, including the posterior parameter distribution (pink) and the blue dashed line indicating the value of the true latent population parameter. The bottom rows refer to individual parameter recovery, showing a strong correlation between simulated individual parameters and recovered ones. (A) The three parameters of the “double updating with two prediction errors” model, (B) the “double updating with one prediction error” model, and (C) the “select-reject” model. Overall, we found good parameter recovery for all parameters and models.

**Fig. 4.. Simulated effects for computational models.**
(A) The main regression signatures found in the empirical data. For each model (B to E), we simulated data from 175 agents using empirical parameters (sampled from the population marginal posterior distributions). We then used the simulated data to examine the ability of each model to reproduce the main regression signatures we found in the empirical data. (Left and middle columns) The effect of previous outcome on the probability of choosing a previously unchosen action and the corresponding posterior distribution (estimates are presented on the log-odds scale). (Right column) The moderation of choice difficulty in the previous trial (indicated by absolute difference between the expected values of the two offers) on the effect of previous outcome on the probability of choosing a previously unchosen action. Overall, we found that the baseline model was unable to reproduce the effect of latent updating for unchosen actions. All other models were able to produce this effect in the same direction as the empirical data, with the select-reject model showing the closest effect to the empirical one. PE, prediction error.

**Fig. 5.. Results of the select-reject value learning model.**
(A) Population-level posterior distribution of the ω parameter in a hierarchical model. For ω = 1, individuals will consider the value of a card based on the reward history when that card was selected. For ω = 0, individuals will consider the value of a card only based on the reward history when the alternative was rejected. The posterior distribution suggests that participants weigh in their decision both the history of when a card was chosen and when the alternative was rejected, with greater emphasis on the former compared with the latter. The posterior high-density interval (gray horizontal line) is clearly below one, suggesting that individuals considered the value of actions not only based on trials where a card was chosen but also to a lesser degree based on trials where the alternative was rejected (i.e., 0.5 < ω < 1). (B) Association between median posterior ω parameter estimates for each individual and the model-independent effect estimated for each individual using empirical data (i.e., β_{previous outcome}). The positive association demonstrates the model’s ability to capture individual differences. (C) Association between individuals’ ω parameter estimates and their mean choice accuracy. The correlation shows that higher ω values correlated with better performance accuracy in the current bandit task.

See this image and copyright information in PMC

Cited by

Flexible gating between subspaces in a neural network model of internally guided task switching.
Liu Y, Wang XJ. Liu Y, et al. bioRxiv [Preprint]. 2024 Jun 10:2023.08.15.553375. doi: 10.1101/2023.08.15.553375. bioRxiv. 2024. Update in: Nat Commun. 2024 Aug 1;15(1):6497. doi: 10.1038/s41467-024-50501-y. PMID: 37645801 Free PMC article. Updated. Preprint.
Flexible gating between subspaces in a neural network model of internally guided task switching.
Liu Y, Wang XJ. Liu Y, et al. Nat Commun. 2024 Aug 1;15(1):6497. doi: 10.1038/s41467-024-50501-y. Nat Commun. 2024. PMID: 39090084 Free PMC article.
The more random people's preference judgments are, the more they explore in gambling tasks.
Zhu J, Katahira K, Hirakawa M, Nakao T. Zhu J, et al. BMC Psychol. 2024 Dec 20;12(1):766. doi: 10.1186/s40359-024-02252-0. BMC Psychol. 2024. PMID: 39707509 Free PMC article.
Cognitive psychology: computing the value of the choices we do not make.
Brazil IA. Brazil IA. Commun Psychol. 2024 Mar 2;2(1):17. doi: 10.1038/s44271-024-00064-x. Commun Psychol. 2024. PMID: 39242742 Free PMC article.
Value Estimation versus Effort Mobilization: A General Dissociation between Ventromedial and Dorsomedial Prefrontal Cortex.
Clairis N, Pessiglione M. Clairis N, et al. J Neurosci. 2024 Apr 24;44(17):e1176232024. doi: 10.1523/JNEUROSCI.1176-23.2024. J Neurosci. 2024. PMID: 38514180 Free PMC article.

See all "Cited by" articles

References

1. R. S. Sutton, A. G. Barto, Introduction to Reinforcement Learning (MIT Press, 1998), vol. 135.
1. S. Wang, S. F. Feng, A. M. Bornstein, Mixing memory and desire: How memory reactivation supports deliberative decision-making. Wiley Interdiscip. Rev. Cogn. Sci. 13, e1581 (2022). - PubMed
1. A. Tversky, I. Simonson, Context-dependent preferences. Manage. Sci. 39, 1179–1189 (1993).
1. D. L. Schacter, R. G. Benoit, K. K. Szpunar, Episodic future thinking: Mechanisms and functions. Curr. Opin. Behav. Sci. 17, 41–50 (2017). - PMC - PubMed
1. D. E. Bell, Regret in decision making under uncertainty. Oper. Res. 30, 961–981 (1982).

MeSH terms

Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- Dryad Digital Repository - Access Curated Datasets

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Computational mechanisms underlying latent value updating of unchosen actions

Affiliations

Computational mechanisms underlying latent value updating of unchosen actions

Authors

Affiliations

Abstract

Figures

Similar articles

Cited by

References

MeSH terms

LinkOut - more resources

Full Text Sources

Other Literature Sources

Abstract

Figures

Similar articles

Cited by

References

MeSH terms

Related information

LinkOut - more resources

Full Text Sources

Other Literature Sources