Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Oct 20;9(42):eadi2704.
doi: 10.1126/sciadv.adi2704. Epub 2023 Oct 20.

Computational mechanisms underlying latent value updating of unchosen actions

Affiliations

Computational mechanisms underlying latent value updating of unchosen actions

Ido Ben-Artzi et al. Sci Adv. .

Abstract

Current studies suggest that individuals estimate the value of their choices based on observed feedback. Here, we ask whether individuals also update the value of their unchosen actions, even when the associated feedback remains unknown. One hundred seventy-eight individuals completed a multi-armed bandit task, making choices to gain rewards. We found robust evidence suggesting latent value updating of unchosen actions based on the chosen action's outcome. Computational modeling results suggested that this effect is mainly explained by a value updating mechanism whereby individuals integrate the outcome history for choosing an option with that of rejecting the alternative. Properties of the deliberation (i.e., duration/difficulty) did not moderate the latent value updating of unchosen actions, suggesting that memory traces generated during deliberation might take a smaller role in this specific phenomenon than previously thought. We discuss the mechanisms facilitating credit assignment to unchosen actions and their implications for human decision-making.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.. Inverse value updating of unchosen actions.
(A) Illustration of a trial sequence. Participants completed a four-armed bandit task. In each trial, two cards (of four) were randomly offered by the computer for participants’ selection. We examined trials where the unchosen card in trial n was reoffered at trial n + 1 with a card that was not offered on trial n. This allowed us to examine whether the outcome associated with the chosen card in trial n influenced the probability that the participant will select the previously unchosen card at trial n + 1. For example, as illustrated in this panel, we ask whether the reward delivered at trial n (as a result of choosing the dark card) influenced the probability of selecting the unchosen card (orange) when offered with a third card (blue). (B) Card selection led to a binary outcome determined by slowly drifting probabilities. We used randomly drifting reward probabilities to ensure continued learning. The reward probabilities of each card were independent (mean shared variance = 5.3%). (C) Probability of choosing a previously unchosen action as a function of outcome in the previous trial. Results indicated that the probability of choosing a previously unchosen card was reduced after rewarded trials compared to unrewarded trials. This was true for both win blocks (where outcomes included winning/not winning a play pound coin) and loss blocks (where outcomes included not winning/losing a play pound coin). (D) The posterior distributions for the influence of previous outcome (top) and the interaction with condition (bottom) on choosing the previously unchosen card in a logistic regression (the blue dotted line indicates the null point, and the gray horizontal line indicates HDI95%). Overall results indicate an inverted influence of the previous outcome on the chances of selecting an unchosen action, regardless of win/loss conditions.
Fig. 2.
Fig. 2.. Moderation of deliberation duration and difficulty on inverse value updating of unchosen actions.
(A) Hierarchical Bayesian logistic regression showed no moderating effect for the absolute difference in expected values between the two offered cards on the tendency to assign value to an unchosen action. (B) Higher RTs were assumed to be indicative of increased deliberation but had no moderation effect on the tendency to assign value for unchosen actions. (C) Posterior distribution showing no evidence supporting the moderation of value updating for unchosen actions by deliberation difficulty (blue line indicating the null point; probability of direction = 65%; gray line indicating HDI95%). (D) Posterior distribution depicting a lack of evidence for the interaction between RT and previous outcome on the tendency to choose a previously unselected card (blue line indicating the null point; probability of direction = 55%; gray line indicates HDI95%).
Fig. 3.
Fig. 3.. Parameter recovery.
The top rows in each panel present population parameter recovery, including the posterior parameter distribution (pink) and the blue dashed line indicating the value of the true latent population parameter. The bottom rows refer to individual parameter recovery, showing a strong correlation between simulated individual parameters and recovered ones. (A) The three parameters of the “double updating with two prediction errors” model, (B) the “double updating with one prediction error” model, and (C) the “select-reject” model. Overall, we found good parameter recovery for all parameters and models.
Fig. 4.
Fig. 4.. Simulated effects for computational models.
(A) The main regression signatures found in the empirical data. For each model (B to E), we simulated data from 175 agents using empirical parameters (sampled from the population marginal posterior distributions). We then used the simulated data to examine the ability of each model to reproduce the main regression signatures we found in the empirical data. (Left and middle columns) The effect of previous outcome on the probability of choosing a previously unchosen action and the corresponding posterior distribution (estimates are presented on the log-odds scale). (Right column) The moderation of choice difficulty in the previous trial (indicated by absolute difference between the expected values of the two offers) on the effect of previous outcome on the probability of choosing a previously unchosen action. Overall, we found that the baseline model was unable to reproduce the effect of latent updating for unchosen actions. All other models were able to produce this effect in the same direction as the empirical data, with the select-reject model showing the closest effect to the empirical one. PE, prediction error.
Fig. 5.
Fig. 5.. Results of the select-reject value learning model.
(A) Population-level posterior distribution of the ω parameter in a hierarchical model. For ω = 1, individuals will consider the value of a card based on the reward history when that card was selected. For ω = 0, individuals will consider the value of a card only based on the reward history when the alternative was rejected. The posterior distribution suggests that participants weigh in their decision both the history of when a card was chosen and when the alternative was rejected, with greater emphasis on the former compared with the latter. The posterior high-density interval (gray horizontal line) is clearly below one, suggesting that individuals considered the value of actions not only based on trials where a card was chosen but also to a lesser degree based on trials where the alternative was rejected (i.e., 0.5 < ω < 1). (B) Association between median posterior ω parameter estimates for each individual and the model-independent effect estimated for each individual using empirical data (i.e., βprevious outcome). The positive association demonstrates the model’s ability to capture individual differences. (C) Association between individuals’ ω parameter estimates and their mean choice accuracy. The correlation shows that higher ω values correlated with better performance accuracy in the current bandit task.

Similar articles

Cited by

References

    1. R. S. Sutton, A. G. Barto, Introduction to Reinforcement Learning (MIT Press, 1998), vol. 135.
    1. S. Wang, S. F. Feng, A. M. Bornstein, Mixing memory and desire: How memory reactivation supports deliberative decision-making. Wiley Interdiscip. Rev. Cogn. Sci. 13, e1581 (2022). - PubMed
    1. A. Tversky, I. Simonson, Context-dependent preferences. Manage. Sci. 39, 1179–1189 (1993).
    1. D. L. Schacter, R. G. Benoit, K. K. Szpunar, Episodic future thinking: Mechanisms and functions. Curr. Opin. Behav. Sci. 17, 41–50 (2017). - PMC - PubMed
    1. D. E. Bell, Regret in decision making under uncertainty. Oper. Res. 30, 961–981 (1982).