Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Nov;10(44):eadl3931.
doi: 10.1126/sciadv.adl3931. Epub 2024 Oct 30.

Computation noise promotes zero-shot adaptation to uncertainty during decision-making in artificial neural networks

Affiliations

Computation noise promotes zero-shot adaptation to uncertainty during decision-making in artificial neural networks

Charles Findling et al. Sci Adv. 2024 Nov.

Abstract

Random noise in information processing systems is widely seen as detrimental to function. But despite the large trial-to-trial variability of neural activity, humans show a remarkable adaptability to conditions with uncertainty during goal-directed behavior. The origin of this cognitive ability, constitutive of general intelligence, remains elusive. Here, we show that moderate levels of computation noise in artificial neural networks promote zero-shot generalization for decision-making under uncertainty. Unlike networks featuring noise-free computations, but like human participants tested on similar decision problems (ranging from probabilistic reasoning to reversal learning), noisy networks exhibit behavioral hallmarks of optimal inference in uncertain conditions entirely unseen during training. Computation noise enables this cognitive ability jointly through "structural" regularization of network weights during training and "functional" regularization by shaping the stochastic dynamics of network activity after training. Together, these findings indicate that human cognition may ride on neural variability to support adaptive decisions under uncertainty without extensive experience or engineered sophistication.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.. RNNs and training/testing regimes.
The decision-making RNN is fed with input xt, which is combined with the previous recurrent activity zt−1 and passed through a nonlinear activation function σz(·) to obtain the updated recurrent activity zt. The output (decision policy) yt of the RNN is obtained from zt and passed through a softmax function σy(·) to choose an action. (A) RNN with exact (noise-free) computations. (B) RNN with noisy computations. The recurrent network updates are now corrupted by zero-mean normally distributed noise of SD σ. (C) Training and testing regimes. The weights of the decision-making RNN are trained using backpropagation on a first task A. The trained weights are then frozen, and the RNN is tested either on task A or on a variant of the task A (task A*) with an added source of uncertainty. (D) Studied task environments. Task environment 1: The RNN is presented with a single (task A, associative learning) or multiple (task A*, cue combination) symbol(s) or cue(s) and then chooses between two actions. The RNN obtains a positive or negative outcome that depends on the probabilistic association between the presented cue(s) and the chosen action. Task environment 2: The RNN is repeatedly presented with a slot machine with two arms to choose from. The RNN receives a reward as a function of the reward probability associated with the chosen arm (task A, fixed; and task A*, reversing within a single game).
Fig. 2.
Fig. 2.. Zero-shot performance in the weather prediction task.
(A) Description of task environment 1. Left: In task A, the agent is presented with five samples of the same cue among eight possible cues, each of which predicts probabilistically the rewarded action in the current trial. In task A*, the agent predicts the rewarded action based on sequences consisting of samples of different cues. Middle: Fraction of trials for which each action is rewarded for each of the eight cues. Right: Fraction of trials for which action 2 is chosen in response to each cue after training in task A, for exact and noisy RNNs. (B) Left: Fraction of correct stimulus-response associations in task A (dashed lines) and fraction of correct predictions in task A* (solid lines) in responses to sequences of five cues, for RNNs trained in task A and tested with increasing amounts of computation noise (x axis). The gray line corresponds to the fraction of correct predictions in task A* if only the first cue is taken into account. Right: Fraction of correct predictions in task A* for increasing numbers of cues, for RNNs trained and tested with computation noise (light blue), RNNs trained with computation noise knocked out (KO) during testing (dark blue), and exact RNNs. (C) Psychophysical kernels for RNNs with computation noise (blue, σ = 1) and exact RNNs (gray) for sequences of 5, 10, and 15 cues. Dashed lines indicate flat (ideal) integration kernels. Inset: Relation between the objective (x axis) and subjective (y axis) reliabilities of individual cues in the decision process. LogLR, log-likelihood ratio.
Fig. 3.
Fig. 3.. Activity patterns in recurrent networks featuring computation noise during the weather prediction task.
(A) Temporal cross-correlation matrices for RNNs with computation noise (σ = 1), between the ideal log posterior (x axis) and PC1 activity (y axis) (left) and between the ideal log-likelihood (x axis) and PC2 activity (y axis) (right). RNNs with computation noise encode the ideal log posterior and log likelihood with near-zero lag across the sequence. (B) Variability scaling with cue uncertainty. Output activity (light line, mean; and dark line, SD) associated with the eight cues for RNNs trained and tested with computation noise. The SD of output activity increased by 255% from the most reliable to the least reliable cues, whereas their mean only decreased by 6%. Reliability is expressed in log-likelihood ratio (logLR) units, i.e., the magnitude of log-likelihood ratio associated with each cue regarding the rewarded action (1 or 2).
Fig. 4.
Fig. 4.. Zero-shot performance in the reversal learning task.
(A) Description of task environment 2. Left: Description of task A with fixed reward schedules. The most rewarded arm varies randomly across games such that the agent needs to learn which arm is most rewarded in each game. Right: Description of the reversal learning task A*. The reward probabilities associated with the two arms reverse in the middle of the game, such that the agent needs to switch away from a previously reinforced action. (B) Effect of computation noise on performance. Left: Performance (proportion correct) as a function of the regularization parameter for different RNN types on train task A and test task A*. RNNs with moderate levels of computation noise (σ ~ 0.5) adapt efficiently their behavior to the reversal. Exact RNNs correspond to computation noise = 0 (leftmost point on x axis). Right: Reversal curves for noisy RNNs (blue, σ = 0.5) and exact RNNs (gray). (C) Reversal curves of RNNs trained with computation noise either present (light blue) or knocked out (KO; dark blue) during testing. Inset: KO effect on reversal time constant. Knocking out computation noise leads to slower reversals in response to changes in reward probabilities. (D) Mean trajectories of activity patterns in the two-dimensional space predicting the action plan and the previous reward from recurrent activity (in arbitrary units). ***P < 0.001.
Fig. 5.
Fig. 5.. Adaptation to volatile schedules and emergent noise structure.
(A) Description of the volatile bandit task A*. Left: Stable condition with fixed reward probabilities. Right: Volatile condition with reversing reward probabilities every 25 trials. (B) Performance correct achieved by RNNs with (blue, σ = 0.5, left) and without (gray, right) computation noise in the stable (left) and volatile (right) conditions. RNNs with computation noise substantially outperform exact RNNs in the volatile condition. (C) Best-fitting learning rates for the different types of RNNs in the stable and volatile conditions. Unlike RNNs without computation noise, RNNs with computation noise exhibit an adaptation of their learning to the volatility. (D) Description of the restless bandit task A*. Left: The reward probabilities associated with the two arms drift randomly over the course of the game (200 trials). Right: Bayesian model comparison between Q-learning RL models with no computation noise, white computation noise and Weber-structured computation noise. The Q-learning RL models were fitted on simulated actions from the RNNs with computation noise (σ = 0.5). The behavior of RNNs with computation noise is better fitted by a Q-learning RL model featuring a Weber-like noise structure. (E) Relationship between the quantity of update in the RNNs with computation noise and the noise corrupting the update. To obtain the two dimensions, we projected the quantity of update and the computation noise in the recurrent activity on the decision axis. As predicted by a Weber-like noise structure, the population-level computation noise scales with the quantity of update in the network. Inset: Same relationship for RNNs trained without computation noise but tested with computation noise. ***P < 0.001; n.s., nonsignificant.
Fig. 6.
Fig. 6.. Moderate levels of computation noise improve performance in human participants.
(A) Description of the experimental paradigm. Example of drifts in the magnitude of rewards that can be obtained from the two arms. Rewards were sampled from probability distributions with means that drifted independently across trials. Thick lines represent the drifting means of the two probability distributions, whereas thin lines correspond to reward samples drawn from the probability distributions that can be obtained if chosen in each trial. (B) Bayesian model selection results for n = 198 human participants. On par with computation noise RNNs and previous published results, humans were best explained by a Q-learning RL model corrupted with Weber-structured noise at each update step. (C) Excess points earnt compared to chance level by human participants (n = 198 in yellow, binned in deciles) and a noisy RL algorithm corrupted by Weber-structure additive noise (in cyan). If computation noise only impeded human learning, then human performance would decrease with the level of computation noise, similarly to the noisy RL algorithm. Moderate levels of computation noise improve performance in tested human participants. Error bars and shaded areas represent SEM.

References

    1. Faisal A. A., Selen L. P. J., Wolpert D. M., Noise in the nervous system. Nat. Rev. Neurosci. 9, 292–303 (2008). - PMC - PubMed
    1. Wyart V., Koechlin E., Choice variability and suboptimality in uncertain environments. Curr. Opin. Behav. Sci. 11, 109–115 (2016).
    1. Findling C., Wyart V., Computation noise in human learning and decision-making: Origin, impact, function. Curr. Opin. Behav. Sci. 38, 124–132 (2021).
    1. Moreno-Bote R., Rinzel J., Rubin N., Noise-induced alternations in an attractor network model of perceptual bistability. J. Neurophysiol. 98, 1125–1139 (2007). - PMC - PubMed
    1. van Vugt B., Dagnino B., Vartak D., Safaai H., Panzeri S., Dehaene S., Roelfsema P. R., The threshold for conscious report: Signal loss and response bias in visual and frontal cortex. Science 360, 537–542 (2018). - PubMed

Publication types

LinkOut - more resources