Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Dec 19;18(12):e1010805.
doi: 10.1371/journal.pcbi.1010805. eCollection 2022 Dec.

Model-based prioritization for acquiring protection

Affiliations

Model-based prioritization for acquiring protection

Sarah M Tashjian et al. PLoS Comput Biol. .

Abstract

Protection often involves the capacity to prospectively plan the actions needed to mitigate harm. The computational architecture of decisions involving protection remains unclear, as well as whether these decisions differ from other beneficial prospective actions such as reward acquisition. Here we compare protection acquisition to reward acquisition and punishment avoidance to examine overlapping and distinct features across the three action types. Protection acquisition is positively valenced similar to reward. For both protection and reward, the more the actor gains, the more benefit. However, reward and protection occur in different contexts, with protection existing in aversive contexts. Punishment avoidance also occurs in aversive contexts, but differs from protection because punishment is negatively valenced and motivates avoidance. Across three independent studies (Total N = 600) we applied computational modeling to examine model-based reinforcement learning for protection, reward, and punishment in humans. Decisions motivated by acquiring protection evoked a higher degree of model-based control than acquiring reward or avoiding punishment, with no significant differences in learning rate. The context-valence asymmetry characteristic of protection increased deployment of flexible decision strategies, suggesting model-based control depends on the context in which outcomes are encountered as well as the valence of the outcome.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Study structure.
(a) Protection acquisition shares positive valence features with appetitive reward and negative context features with aversive punishment. The context-valence asymmetry of protection acquisition was hypothesized to be reflected in distinct engagement of decision control systems compared with stimuli in consistently appetitive or aversive domains. All studies included the (b) protection acquisition task variant and a comparison task variant: (c) reward acquisition in Study 1, (d) direct reward acquisition in Study 2, and (e) punishment avoidance in Study 3. Study 1 compared protection and reward (b versus c) using abbreviated task versions comprised of 100 non-practice trials. Study 2 compared protection and direct reward (b versus d) using longer task versions comprised of 200 non-practice trials. Study 3 compared protection and punishment (b versus e) using the longer task versions comprised of 200 non-practice trials. Deterministic transition structures are depicted with blue and orange arrows and indicate that the same first-stage state always leads to the same second-stage state. At the start of each trial, subjects saw the stakes amplifier, which showed “x1” for low-stake trials or “x5” for high-stake trials. Low-stakes results ranged from 0–9 units whereas high-stakes results ranged from 0–45 units. The stakes amplifier was applied to the punishment/reward available on that trial as well as the final result received. Next, subjects saw one of two pairs of first-stage dwellings (e.g., trees or houses). After subjects chose between the left and right dwelling depicted, they transitioned to the second-stage creature (e.g., gnomes or elves). Second-stage creatures delivered outcomes in the form of shields (protection), sacks (reward), coins (direct reward), or flames (punishment). At the second-stage, subjects received outcomes ranging between 0–9 according to a drifting outcome rate. Outcomes changed slowly over the course of the task according to independent Gaussian random walks (σ = 2) with reflecting bounds at 0 and 9 to encourage learning throughout. Outcomes were multiplied by stakes and presented as final results applied to the maximum reward/penalty available on each trial. For example, in panel (b), subjects visited the low-payoff second-stage gnome. This gnome delivered two shields. When two shields were delivered on a low-stakes trial, which had the threat of 9 dragon flames, the end result was 7 flames (9 minus 2). When two shields were delivered on a high-stakes trial, which had a threat of 45 dragon flames (9 flames multiplied by the stakes amplifier of 5), the end result was 35 flames (45 minus 10, 10 is calculated from 2 shields multiplied by the stakes amplifier of 5).
Fig 2
Fig 2. Model-Based Control and Learning Rate Results.
a-b. Raincloud plots depicting model-based control weighting (ω) and learning rate (α) by study and task variant. ω was significantly higher for the protection variants compared to all other task variants. α did not significantly differ across task variants. Far right legend indicates task variants across all studies: Study 1 = Reward and Protection 1, Study 2 = Direct Reward and Protection 2, Study 3 = Punishment and Protection 3. c-d. Scatterplots and linear regression lines depicting positive associations between both ω and α with corrected reward rate by study and task variant. Higher ω and α were significantly associated with higher corrected reward rate for all task variants. e. Mixed-effects model parameters testing contributions of the first-stage state, prior trial outcome, and interaction between first-stage state and prior trial outcome on stay probabilities. Effects from both model-free and model-based contributions were observed. MF = model-free control; MB = model-based control. f-h. Mixed-effects models testing model-based (different first-stage state) and model-free (same first-stage state) contributions to stay probabilities (likelihood of repeating the same second-stage state). Increased model-based contributions were revealed on the protection task variants compared with all other task variants.
Fig 3
Fig 3. Metacognitive and predictive bias results.
a. Histograms depicting Certainty ratings by study and task variant. Certainty was rated with respect to how sure subjects felt they were that they selected the first-stage state that would lead to the most optimal outcomes. Certainty ratings were made on a scale of 0–9 from not at all certain to very certain. b. Histograms depicting Outcome Estimates by study and task variant. Outcome Estimates were provided with respect to how many outcome units subjects thought they would receive at the second-stage. Outcome Estimates were made on a scale of 0–9 outcome units (i.e., subjects who rated a 2 thought they would receive 2 shields/sacks/coins/flames, respectively). c. Mixed-effects model parameters testing metacognitive and predictive bias by modeling actual outcome received as a function of Certainty and Outcome Estimates, respectively. d. Metacognitive bias boxplots by study and task variant. Metacognitive bias was calculated by extracting random slope coefficients from the model of outcome predicting Certainty. Significant differences were only identified in Study 3 with reduced bias for the protection acquisition variant compared to the punishment avoidance variant. e. Predictive bias boxplots by study and task variant. Predictive bias was calculated by extracting random slope coefficients from the model of outcome predicting Outcome Estimates. Significantly reduced bias was revealed for the protection acquisition variants compared to the reward acquisition and punishment avoidance variants, but not compared to the direct reward variant. f. Model parameters for metacognitive and predictive bias coefficients regressed against model-based control weighting (ω) and learning rate (α) parameters for each task variant. Far right legends indicate task variants across all studies: Study 1 = Reward and Protection 1, Study 2 = Direct Reward and Protection 2, Study 3 = Punishment and Protection 3.
Fig 4
Fig 4. Anxiety and model-based weighting (ω) estimated separately for each Study.
Model-based prioritization was observed for protection compared with punishment avoidance (Study 3) for individuals with higher anxiety scores and for protection compared with reward acquisition for individuals with lower anxiety scores (Study 1 and 2). Dashed grey line represents the sample average scores on the State-Trait Anxiety Inventory (STAI) Trait Anxiety subscale.

References

    1. Tashjian SM, Zbozinek TD, Mobbs D. A Decision Architecture for Safety Computations. Trends in Cognitive Sciences. 2021;25: 342–354. doi: 10.1016/j.tics.2021.01.013 - DOI - PMC - PubMed
    1. Dayan P, Daw ND. Decision theory, reinforcement learning, and the brain. Cognitive, Affective, & Behavioral Neuroscience. 2008;8: 429–453. doi: 10.3758/CABN.8.4.429 - DOI - PubMed
    1. Dolan RJ, Dayan P. Goals and Habits in the Brain. Neuron. 2013;80: 312–325. doi: 10.1016/j.neuron.2013.09.007 - DOI - PMC - PubMed
    1. Drummond N, Niv Y. Model-based decision making and model-free learning. Current Biology. 2020;30: R860–R865. doi: 10.1016/j.cub.2020.06.051 - DOI - PubMed
    1. Kim D, Park GY, O′Doherty JP, Lee SW. Task complexity interacts with state-space uncertainty in the arbitration between model-based and model-free learning. Nat Commun. 2019;10: 5738. doi: 10.1038/s41467-019-13632-1 - DOI - PMC - PubMed

Publication types