Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2008:21:1873-1880.

Sequential effects: Superstition or rational behavior?

Affiliations

Sequential effects: Superstition or rational behavior?

Angela J Yu et al. Adv Neural Inf Process Syst. 2008.

Abstract

In a variety of behavioral tasks, subjects exhibit an automatic and apparently suboptimal sequential effect: they respond more rapidly and accurately to a stimulus if it reinforces a local pattern in stimulus history, such as a string of repetitions or alternations, compared to when it violates such a pattern. This is often the case even if the local trends arise by chance in the context of a randomized design, such that stimulus history has no real predictive power. In this work, we use a normative Bayesian framework to examine the hypothesis that such idiosyncrasies may reflect the inadvertent engagement of mechanisms critical for adapting to a changing environment. We show that prior belief in non-stationarity can induce experimentally observed sequential effects in an otherwise Bayes-optimal algorithm. The Bayesian algorithm is shown to be well approximated by linear-exponential filtering of past observations, a feature also apparent in the behavioral data. We derive an explicit relationship between the parameters and computations of the exact Bayesian algorithm and those of the approximate linear-exponential filter. Since the latter is equivalent to a leaky-integration process, a commonly used model of neuronal dynamics underlying perceptual decision-making and trial-to-trial dependencies, our model provides a principled account of why such dynamics are useful. We also show that parameter-tuning of the leaky-integration process is possible, using stochastic gradient descent based only on the noisy binary inputs. This is a proof of concept that not only can neurons implement near-optimal prediction based on standard neuronal dynamics, but that they can also learn to tune the processing parameters without explicitly representing probabilities.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Bayesian modeling of sequential effects. (a) Median reaction time (RT) from Cho et al (2002) affected by recent history of stimuli, in which subjects are required to discriminate a small “o” from a large “O” using button-presses. Along the abscissa are all possible four-trial sub-sequences, in terms of repetitions (R) and alternations (A). Each sequence, read from top to bottom, proceeds from the earliest stimulus progressively toward the present stimulus. As the effects were symmetric across the two stimulus types, A and B, each bin contains data from a pair of conditions (e.g. RRAR can be AAABB or BBBAA). RT was fastest when a pattern is reinforced (RRR followed by R, or AAA followed by A); it is slowest when an “established” pattern is violated (RRR followed by A, or AAA followed by R). (b) Assuming RT decreases with predicted stimulus probability (i.e. RT increases with 1 − P(xt|xt−1), where xt is the actual stimulus seen), then FBM would predict much weaker sequential effects in the second half (blue: 720 simulated trials) than in the first half (red: 840 trials). (c) DBM predicts persistently strong sequential effects in both the first half (red: 840 trials) and second half (blue: 720 trials). Inset shows prior over γ used; the same prior was also used for the FBM in (b). α = .77. (d) Sequential effects in behavioral data were equally strong in the first half (red: 7 blocks of 120 trials each) and the second half (blue: 6 blocks of 120 trials each). Green dashed line shows a linear transformation from the DBM prediction in probability space of (c) into the RT space. The fit is very good given the errorbars (SEM) in the data.
Figure 2
Figure 2
Bayesian inference assuming fixed and changing Bernoulli parameters. (a) Graphical model for the FBM. γ ∈ [0, 1], xt ∈ {0, 1}. The numbers in circles show example values for the variables. (b) Graphical model for the DBM. γt = αδ(γt − γt−1) + (1 − α)p0t), where we assume the prior p0 to be a Beta distribution. The numbers in circles show examples values for the variables. (c) Grayscale shows the evolution of posterior probability mass over γ for FBM (darker color indicate concentration of mass), given the sequence of truly random (P(xt) = .5) binary data (blue dots). The mean of the distribution, in cyan, is also the predicted stimulus probability: P(xt = 1|xt−1) = 〈γ|xt−1〉. (d) Evolution of posterior probability mass for the DBM (grayscale) and predictive probability P(xt = 1|xt−1) (cyan); they perpetually fluctuate with transient runs of repetitions or alternations.
Figure 3
Figure 3
Exponential discounting a good descriptive and normative model. (a) For each of the six subjects, we regressed RR on repetition trials against past observations, RTC + b1xt−1 + b2xt−2 + …, where xτ is assigned 0 if it was repetition, and 1 if alternation, the idea being that recent repetition trials should increase expectation of repetition and decrease RR, and recent alternation should decrease expectation of repetition and increase RR on a repetition trial. Separately we also regressed RR’s on alternation trials against past observations (assigning 0 to alternation trials, and 1 to repetitions). The two sets of coefficients did not differ significantly and were averaged togther (red: average across subjects, error bars: SEM). Blue line shows the best exponential fit to these coefficients. (b) We regressed Pt obtained from exact Bayesian DBM inference, against past observations, and obtained a set of average coefficients (red); blue is the best exponential fit. (c) For different values of α, we repeat the process in (b) and obtain the best exponential decay parameter β (blue). Optimal β closely tracks the 2/3 rule for a large range of values of α. β is .57 in (a), so α = .77 was used to generate (b). (d) Both the optimal exponential fit (red) and the 2/3 rule (blue) approxiate the true Bayesian Pt well (green dashed line shows perfect match). α = .77. For smaller values of α, the fit is even better; for larger α, the exponential approximation deteriorates (not shown). (e) For repetition trials, the greater the predicted probability of seeing a repetition (xt = 1), the faster the RT, whether trials are categorized by Bayesian predictive probabilities (red: α = .77, p0 = Beta(1.6, 1.3)), or by linear exponential filtering (blue). For alternation trials, RT’s increase with increasing predicted probability of seeing a repetition. Inset: for the biases b ∈ [.2, .8], the log prior ratio (shift in the initial starting point, and therefore change in the distance to decision boundary) is approximately linear.
Figure 4
Figure 4
Meta-learning about the rate of change. (a) Graphical model for exact Bayesian learning. Numbers are example values for the variables. (b) Mean of posterior p(α|xt) as a function of timesteps, averaged over 30 sessions of simulated data, each set generated from different true values of α (see legend; color-coded dashed lines indicate true α). Inset shows prior over α, p(α) = Beta(17, 3). Time-course of learning is not especially sensitive to the exact form of the prior (not shown). (c) Stochastic gradient descent with a learning rate of .01 produce estimates of α (thick lines, width denotes SEM) that converge to the true values of α (dashed lines). Initial estimate of α, before seeing any data, is .9. Learning based on 50 sessions of 5000 trials for each value of α. (d) Marginal posterior distributions over α (top panel) and γt (bottom panel) on a sample run, where probability mass is color-coded: brighter color is more mass.

References

    1. Skinner BF. J. Exp. Psychol. 1948;38:168–172. - PubMed
    1. Ecott CL, Critchfield TS. J. App. Beh. Analysis. 2004;37:249–265. - PMC - PubMed
    1. Laming DRJ. Information Theory of of Choice-Reaction Times. London: Academic Press; 1968.
    1. Soetens E, Boer LC, Hueting JE. JEP: HPP. 1985;11:598–616.
    1. Cho R, et al. Cognitive, Affective, & Behavioral Neurosci. 2002;2:283–299. - PubMed

LinkOut - more resources