Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Mar 21:13:RP92892.
doi: 10.7554/eLife.92892.

The neural correlates of novelty and variability in human decision-making under an active inference framework

Affiliations

The neural correlates of novelty and variability in human decision-making under an active inference framework

Shuo Zhang et al. Elife. .

Abstract

Active inference integrates perception, decision-making, and learning into a united theoretical framework, providing an efficient way to trade off exploration and exploitation by minimizing (expected) free energy. In this study, we asked how the brain represents values and uncertainties (novelty and variability), and resolves these uncertainties under the active inference framework in the exploration-exploitation trade-off. Twenty-five participants performed a contextual two-armed bandit task, with electroencephalogram (EEG) recordings. By comparing the model evidence for active inference and reinforcement learning models of choice behavior, we show that active inference better explains human decision-making under novelty and variability, which entails exploration or information seeking. The EEG sensor-level results show that the activity in the frontal, central, and parietal regions is associated with novelty, while the activity in the frontal and central brain regions is associated with variability. The EEG source-level results indicate that the expected free energy is encoded in the frontal pole and middle frontal gyrus and uncertainties are encoded in different brain regions but with overlap. Our study dissociates the expected free energy and uncertainties in active inference theory and their neural correlates, speaking to the construct validity of active inference in characterizing cognitive processes of human decisions. It provides behavioral and neural evidence of active inference in decision processes and insights into the neural mechanism of human decisions under uncertainties.

Keywords: active inference; neuroscience; none; the exploration-exploitation trade-off; uncertainty.

PubMed Disclaimer

Conflict of interest statement

SZ, YT, QL, HW No competing interests declared

Figures

Figure 1.
Figure 1.. Active inference.
(a) Qualitatively, agents receive observations from the environment and use these observations to optimize Bayesian beliefs under an internal cognitive (a.k.a., world or generative) model of the environment. Then agents actively sample the environment states by action, choosing actions that would make them in more favorable states. The environment changes its state according to agents’ policies (action sequences) and transition functions. Then again, agents receive new observations from the environment. (b) From a quantitative perspective, agents optimize the Bayesian beliefs under an internal cognitive (a.k.a., world or generative) model of the environment by minimizing the variational free energy. Then agents select policies minimizing the expected free energy, namely, the surprise expected in the future under a particular policy.
Figure 2.
Figure 2.. The contextual two-armed bandit task.
(a) In this task, agents need to make two choices in each trial. The first choice is “Stay” and “Cue”. The “Stay” option gives you nothing while the “Cue” option gives you a –1 reward and the context information about the “Risky” option in the current trial. The second choice is “Safe” and “Risky”. The “Safe” option always gives you a +6 reward and the “Risky” option gives you a reward probabilistically, ranging from 0 to +12 depending on the current context (context 1 or context 2). (b) The four policies in this task are “Cue” and “Safe”, “Stay” and “Safe”, “Cue” and “Risky”, and “Stay” and “Risky”. (c) The likelihood matrix maps from 8 hidden states (columns) to 7 observations (rows).
Figure 3.
Figure 3.. The simulation experiment results.
This figure demonstrates how an agent selects actions and updates beliefs over 60 trials in the active inference framework. The first two panels (a, b) display the agent’s policy and depict how the policy probabilities are updated (choosing between the stay or cue option in the first choice, and selecting between the safe or risky option in the second choice). The scatter plot indicates the agent’s actions, with green representing the cue option when the context of the risky path is “Context 1” (high-reward context), orange representing the cue option when the context of the risky path is “Context 2” (low-reward context), purple representing the stay option when the agent is uncertain about the context of the risky path, and blue indicating the safe-risky choice. The shaded region represents the agent’s confidence, with darker shaded regions indicating greater confidence. The third panel (c) displays the rewards obtained by the agent in each trial. The fourth panel (d) shows the prediction error of the agent in each trial, which decreases over time. Finally, the fifth panel (e) illustrates the expected rewards of the ‘Risky Path’ in the two contexts of the agent.
Figure 4.
Figure 4.. The experiment task and behavioral result.
(a) The five stages of the experiment, which include the “You can ask” stage to prompt the participants to decide whether to request information from the Ranger, the “First choice” stage to decide whether to ask the ranger for information, the “First result” stage to display the result of the “First choice” stage, the “Second choice” stage to choose between left and right paths under different uncertainties and the “Second result” stage to show the result of the “Second choice” stage. The error bars show the 95% confidence interval. (b) The number of times each option was selected. The error bar indicates the variance among participants. (c) The Bayesian information criterion of active inference, model-free reinforcement learning, and model-based reinforcement learning.
Figure 5.
Figure 5.. The comparison between the active inference model and the behavioral data in (a) the “First choice” stage, and the “Second choice” stage; (b) context unknown, (c) “Context 1”, and (d) “Context 2”.
The bar graphs show participants’ behavior data in each trial, and the height shows the proportion of participants who chose a certain option in each trial. The scatter plots show the model’s fitting results for the two choices of the participants. The closer the point is to the bar graph on both sides, the higher the fitting accuracy. The line graphs show the trend of the model fitting accuracies with the trials.
Figure 6.
Figure 6.. EEG results at the sensor level.
(a) The electrode distribution. (b) The signal amplitude of different brain regions in the first and second half of the experiment in the “Second choice” stage. The error bar indicates the amplitude variance in each region. The right panel shows the visualization of the evoked data and spectrum data. (c) The signal amplitude of different brain areas in the “Second choice” stage where participants know the context or do not know the context of the right path. The error bar indicates the amplitude variance in each region. The error bars show the 95% confidence interval. The right panel shows the visualization of the evoked data and spectrum data. FL: frontal-left; FR: frontal-right; C: central; PL: parietal-left; PR: parietal-right.
Figure 7.
Figure 7.. The source estimation results of expected free energy and active inference in the “First choice” stage.
(a) The regression intensity (β) of expected free energy. The right panel indicates the regression intensity between the frontal pole (1, right half) and the expected free energy. The green-shaded regions indicate p<0.05 after false discovery rate (FDR) correction (the average t-value during these significant periods equals −3.228). (b) The regression intensity (β) of the value of reducing variability. The right panel indicates the regression intensity between the medial orbitofrontal cortex (5, left half) and the value of reducing variability. The green-shaded regions indicate p<0.05 after FDR correction (the average t-value during these significant periods equals −3.081). The black lines indicate the average intensities, and the gray-shaded regions indicate the ranges of variations (the 95% confidence interval). The gray lines indicate p<0.05 before FDR.
Figure 8.
Figure 8.. The source estimation results of reducing variability and reducing novelty in the two result stages.
(a) The regression intensity (β) of reducing variability in the “First result” stage. The right panel indicates the regression intensity between the medial orbitofrontal cortex (5, left half) and reducing variability. The green-shaded regions indicate p<0.05 after false discovery rate (FDR) correction (the average t-value during these significant periods equals −3.001). (b) The regression intensity (β) of reducing novelty in the “Second result” stage. The right panel indicates the regression intensity between the precentral gyrus (15, right half) and reducing novelty. The green-shaded regions indicate p<0.05 after FDR correction (the average t-value during these significant periods equals 3.278). The black lines indicate the average intensities, and the gray-shaded regions indicate the ranges of variations (the 95% confidence interval). The gray lines indicate p<0.05 before FDR.
Figure 9.
Figure 9.. The source estimation results of expected free energy and the value of reducing novelty in the “Second choice” stage.
(a) The regression intensity (β) of expected free energy. The right panel indicates the regression intensity between the rostral middle frontal gyrus (1, left half) and expected free energy, the black line indicates the average intensity of this region, and the gray-shaded region indicates the range of variation. The yellow-shaded regions indicate p<0.001 after false discovery rate (FDR) (the average t-value during these significant periods equals −4.819) and the gray lines indicate p<0.001 before FDR. (b) The regression intensity (β) of the value of reducing novelty. The right panel indicates the regression intensity between the rostral middle frontal gyrus (6, left half) and the value of reducing novelty, the black line indicates the average intensity of this region, and the gray-shaded region indicates the range of variation (the 95% confidence interval). The green-shaded regions indicate p<0.05 after FDR (the average t-value during these significant periods equals −3.067) and the gray lines indicate p<0.05 before FDR.
Appendix 1—figure 1.
Appendix 1—figure 1.. The simulation experiment results.
This figure demonstrates how an agent selects actions and updates beliefs over 60 trials in the active inference framework. The first two panels (a, b) display the agent’s policy and depict how the policy probabilities are updated (choosing between the stay or cue option in the first choice, and selecting between the safe or risky option in the second choice). The scatter plot indicates the agent’s actions, with green representing the cue option when the context of the risky path is “Context 1” (high-reward context), orange representing the cue option when the context of the risky path is “Context 2” (low-reward context), purple representing the stay option when the agent is uncertain about the context of the risky path, and blue indicating the safe-risky choice. The shaded region represents the agent’s confidence, with darker shaded regions indicating greater confidence. The third panel (c) displays the rewards obtained by the agent in each trial. The fourth panel (d) shows the prediction error of the agent in each trial. Finally, the fifth panel (e) illustrates the expected rewards of the “Risky Path” in the two contexts of the agent.
Appendix 1—figure 2.
Appendix 1—figure 2.. The simulation experiment results.
This figure demonstrates how an agent selects actions and updates beliefs over 60 trials in the active inference framework. The first two panels (a, b) display the agent’s policy and depict how the policy probabilities are updated (choosing between the stay or cue option in the first choice, and selecting between the safe or risky option in the second choice). The scatter plot indicates the agent’s actions, with green representing the cue option when the context of the risky path is “Context 1” (high-reward context), orange representing the cue option when the context of the risky path is “Context 2” (low-reward context), purple representing the stay option when the agent is uncertain about the context of the risky path, and blue indicating the safe-risky choice. The shaded region represents the agent’s confidence, with darker shaded regions indicating greater confidence. The third panel (c) displays the rewards obtained by the agent in each trial. The fourth panel (d) shows the prediction error of the agent in each trial. Finally, the fifth panel (e) illustrates the expected rewards of the “Risky Path” in the two contexts of the agent.
Appendix 1—figure 3.
Appendix 1—figure 3.. Model recovery results.
Appendix 1—figure 4.
Appendix 1—figure 4.. The source estimation results of extrinsic value in the two choosing stages.
(a) The regression intensity (β) of extrinsic value in the “First choice” stage. The right panel indicates the regression intensity between the middle temporal gyrus (6, right half) and extrinsic value. The green-shaded regions indicate p<0.05 after false discovery rate (FDR) correction (the average t-value during these significant periods equals 3.673). (b) The regression intensity (β) of extrinsic value in the “Second choice” stage. The right panel indicates the regression intensity between the rostral middle frontal gyrus (6, left half) and extrinsic value. The yellow-shaded regions indicate p<0.001 after FDR correction (the average t-value during these significant periods equals 4.740). The black lines indicate the average intensities, and the gray-shaded regions indicate the ranges of variations. The gray lines indicate p<0.05 before FDR.
Appendix 1—figure 5.
Appendix 1—figure 5.. The source estimation results of extrinsic value and prediction error in the “Second result” stage.
(a) The regression intensity (β) of extrinsic value. The right panel indicates the regression intensity between the lateral occipital cortex (3, right half) and extrinsic value. The green-shaded regions indicate p<0.05 after false discovery rate (FDR) correction (the average t-value during these significant periods equals 2.875). (b) The regression intensity (β) of prediction error. The right panel indicates the regression intensity between the lateral occipital cortex (3, right half) and prediction error. The green-shaded regions indicate p<0.05 after FDR correction (the average t-value during these significant periods equals –2.716). The black lines indicate the average intensities, and the gray-shaded regions indicate the ranges of variations. The gray lines indicate p<0.05 before FDR.
Appendix 1—figure 6.
Appendix 1—figure 6.. The source estimation results of ambiguity in the “Second choice” stage.
The right panel indicates the regression intensity between the frontal pole (1, left half) and ambiguity. The black line indicates the average intensities, and the gray-shaded regions indicate the ranges of variations. The gray lines indicate p<0.05 before FDR.

Update of

  • doi: 10.1101/2023.09.18.558250
  • doi: 10.7554/eLife.92892.1
  • doi: 10.7554/eLife.92892.2
  • doi: 10.7554/eLife.92892.3

References

    1. Badre D, Doll BB, Long NM, Frank MJ. Rostrolateral prefrontal cortex and individual differences in uncertainty-driven exploration. Neuron. 2012;73:595–607. doi: 10.1016/j.neuron.2011.12.025. - DOI - PMC - PubMed
    1. Barto A, Mirolli M, Baldassarre G. Novelty or surprise? Frontiers in Psychology. 2013;4:907. doi: 10.3389/fpsyg.2013.00907. - DOI - PMC - PubMed
    1. Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society Series B. 1995;57:289–300. doi: 10.1111/j.2517-6161.1995.tb02031.x. - DOI
    1. Bland AR, Schaefer A. Electrophysiological correlates of decision making under varying levels of uncertainty. Brain Research. 2011;1417:55–66. doi: 10.1016/j.brainres.2011.08.031. - DOI - PubMed
    1. Botelho C, Fernandes C, Campos C, Seixas C, Pasion R, Garcez H, Ferreira-Santos F, Barbosa F, Maques-Teixeira J, Paiva TO. Uncertainty deconstructed: conceptual analysis and state-of-the-art review of the ERP correlates of risk and ambiguity in decision-making. Cognitive, Affective, & Behavioral Neuroscience. 2023;23:522–542. doi: 10.3758/s13415-023-01101-8. - DOI - PubMed

LinkOut - more resources