. 2014 Jun 19;10(6):e1003661.

doi: 10.1371/journal.pcbi.1003661. eCollection 2014 Jun.

On the origins of suboptimality in human probabilistic inference

Luigi Acerbi¹, Sethu Vijayakumar², Daniel M Wolpert³

Affiliations

¹ Institute of Perception, Action and Behaviour, School of Informatics, University of Edinburgh, Edinburgh, United Kingdom; Doctoral Training Centre in Neuroinformatics and Computational Neuroscience, School of Informatics, University of Edinburgh, Edinburgh, United Kingdom.
² Institute of Perception, Action and Behaviour, School of Informatics, University of Edinburgh, Edinburgh, United Kingdom.
³ Computational and Biological Learning Lab, Department of Engineering, University of Cambridge, Cambridge, United Kingdom.

PMID: 24945142
PMCID: PMC4063671
DOI: 10.1371/journal.pcbi.1003661

On the origins of suboptimality in human probabilistic inference

Luigi Acerbi et al. PLoS Comput Biol. 2014.

. 2014 Jun 19;10(6):e1003661.

doi: 10.1371/journal.pcbi.1003661. eCollection 2014 Jun.

Authors

Luigi Acerbi¹, Sethu Vijayakumar², Daniel M Wolpert³

Affiliations

¹ Institute of Perception, Action and Behaviour, School of Informatics, University of Edinburgh, Edinburgh, United Kingdom; Doctoral Training Centre in Neuroinformatics and Computational Neuroscience, School of Informatics, University of Edinburgh, Edinburgh, United Kingdom.
² Institute of Perception, Action and Behaviour, School of Informatics, University of Edinburgh, Edinburgh, United Kingdom.
³ Computational and Biological Learning Lab, Department of Engineering, University of Cambridge, Cambridge, United Kingdom.

PMID: 24945142
PMCID: PMC4063671
DOI: 10.1371/journal.pcbi.1003661

Abstract

Humans have been shown to combine noisy sensory information with previous experience (priors), in qualitative and sometimes quantitative agreement with the statistically-optimal predictions of Bayesian integration. However, when the prior distribution becomes more complex than a simple Gaussian, such as skewed or bimodal, training takes much longer and performance appears suboptimal. It is unclear whether such suboptimality arises from an imprecise internal representation of the complex prior, or from additional constraints in performing probabilistic computations on complex distributions, even when accurately represented. Here we probe the sources of suboptimality in probabilistic inference using a novel estimation task in which subjects are exposed to an explicitly provided distribution, thereby removing the need to remember the prior. Subjects had to estimate the location of a target given a noisy cue and a visual representation of the prior probability density over locations, which changed on each trial. Different classes of priors were examined (Gaussian, unimodal, bimodal). Subjects' performance was in qualitative agreement with the predictions of Bayesian Decision Theory although generally suboptimal. The degree of suboptimality was modulated by statistical features of the priors but was largely independent of the class of the prior and level of noise in the cue, suggesting that suboptimality in dealing with complex statistical features, such as bimodality, may be due to a problem of acquiring the priors rather than computing with them. We performed a factorial model comparison across a large set of Bayesian observer models to identify additional sources of noise and suboptimality. Our analysis rejects several models of stochastic behavior, including probability matching and sample-averaging strategies. Instead we show that subjects' response variability was mainly driven by a combination of a noisy estimation of the parameters of the priors, and by variability in the decision process, which we represent as a noisy or stochastic posterior.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

**Figure 1. Experimental procedure.**
**a: Setup.** Subjects held the handle of a robotic manipulandum. The visual scene from a CRT monitor, including a cursor that tracked the hand position, was projected into the plane of the hand via a mirror. **b: Screen setup.** The screen showed a home position (grey circle), the cursor (red circle) here at the start of a trial, a line of potential targets (dots) and a visual cue (yellow dot). The task consisted in locating the true target among the array of potential targets, given the position of the noisy cue. The coordinate axis was not displayed on screen, and the target line is shaded here only for visualization purposes. **c: Generative model of the task.** On each trial the position of the hidden target was drawn from a discrete representation of the trial-dependent prior , whose shape was chosen randomly from a session-dependent class of distributions. The vertical distance of the cue from the target line, , was either ‘short’ or ‘long’, with equal probability. The horizontal position of the cue, , depended on and . The participants had to infer given , and the current prior . **d: Details of the generative model.** The potential targets constituted a discrete representation of the trial-dependent prior distribution ; the discrete representation was built by taking equally spaced samples from the inverse of the cdf of the prior, . The true target (red dot) was chosen uniformly at random from the potential targets, and the horizontal position of the cue (yellow dot) was drawn from a Gaussian distribution, , centered on the true target and whose SD was proportional to the distance from the target line (either ‘short’ or ‘long’, depending on the trial, for respectively low-noise and high-noise cues). Here we show the location of the cue for a high-noise trial. **e: Components of Bayesian decision making.** According to Bayesian Decision Theory, a Bayesian ideal observer combines the prior distribution with the likelihood function to obtain a posterior distribution. The posterior is then convolved with the loss function (in this case whether the target will be encircled by the cursor) and the observer picks the ‘optimal’ target location (purple dot) that corresponds to the minimum of the expected loss (dashed line).

formula image — **Figure 1. Experimental procedure.**
**a: Setup.** Subjects held the handle of a robotic manipulandum. The visual scene from a CRT monitor, including a cursor that tracked the hand position, was projected into the plane of the hand via a mirror. **b: Screen setup.** The screen showed a home position (grey circle), the cursor (red circle) here at the start of a trial, a line of potential targets (dots) and a visual cue (yellow dot). The task consisted in locating the true target among the array of potential targets, given the position of the noisy cue. The coordinate axis was not displayed on screen, and the target line is shaded here only for visualization purposes. **c: Generative model of the task.** On each trial the position of the hidden target was drawn from a discrete representation of the trial-dependent prior , whose shape was chosen randomly from a session-dependent class of distributions. The vertical distance of the cue from the target line, , was either ‘short’ or ‘long’, with equal probability. The horizontal position of the cue, , depended on and . The participants had to infer given , and the current prior . **d: Details of the generative model.** The potential targets constituted a discrete representation of the trial-dependent prior distribution ; the discrete representation was built by taking equally spaced samples from the inverse of the cdf of the prior, . The true target (red dot) was chosen uniformly at random from the potential targets, and the horizontal position of the cue (yellow dot) was drawn from a Gaussian distribution, , centered on the true target and whose SD was proportional to the distance from the target line (either ‘short’ or ‘long’, depending on the trial, for respectively low-noise and high-noise cues). Here we show the location of the cue for a high-noise trial. **e: Components of Bayesian decision making.** According to Bayesian Decision Theory, a Bayesian ideal observer combines the prior distribution with the likelihood function to obtain a posterior distribution. The posterior is then convolved with the loss function (in this case whether the target will be encircled by the cursor) and the observer picks the ‘optimal’ target location (purple dot) that corresponds to the minimum of the expected loss (dashed line).

**Figure 2. Prior distributions.**
Each panel shows the (unnormalized) probability density for a ‘prior’ distribution of targets, grouped by experimental session, with eight different priors per session. Within each session, priors are numbered in order of increasing differential entropy (i.e. increasing variance for Gaussian distributions). During the experiment, priors had a random location (mean drawn uniformly) and asymmetrical priors had probability 1/2 of being ‘flipped’. Target positions are shown in standardized screen units (from to ). **a: Gaussian priors.** These priors were used for the training session, common to all subjects, and in the Gaussian test session. Standard deviations cover the range to screen units in equal increments. **b: Unimodal priors.** All unimodal priors have fixed SD screen units but different skewness and kurtosis (see Methods for details). **c: Bimodal priors.** All priors in the bimodal session have fixed SD screen units but different relative weights and separation between the peaks (see Methods).

**Figure 3. Subjects' responses as a function of the position of the cue.**
Each panel shows the pooled subjects' responses as a function of the position of the cue either for low-noise cues (red dots) or high-noise cues (blue dots). Each column corresponds to a representative prior distribution, shown at the top, for each different group (Gaussian, unimodal and bimodal). In the response plots, dashed lines correspond to the Bayes optimal strategy given the generative model of the task. The continuous lines are a kernel regression estimate of the mean response (see Methods). a. Exemplar Gaussian prior (prior 4 in Figure 2a). b. Exemplar unimodal prior (platykurtic distribution: prior 4 in Figure 2b). c. Exemplar bimodal prior (prior 5 in Figure 2c). Note that in this case the mean response is not necessarily a good description of subjects' behavior, since the marginal distribution of responses for central positions of the cue is bimodal.

**Figure 4. Response slopes for the training session.**
Response slope as a function of the SD of the Gaussian prior distribution, , plotted respectively for trials with low noise (‘short’ cues, red line) and high noise (‘long’ cues, blue line). The response slope is equivalent to the linear weight assigned to the position of the cue (Eq. 1). Dashed lines represent the Bayes optimal strategy given the generative model of the task in the two noise conditions. *Top*: Slopes for a representative subject in the training session (slope SE). *Bottom*: Average slopes across all subjects in the training session (, mean SE across subjects).

**Figure 5. Group mean optimality index.**
Each bar represents the group-averaged optimality index for a specific session, for each prior (indexed from 1 to 8, see also Figure 2) and cue type, low-noise cues (red bars) or high-noise cues (blue bars). The optimality index in each trial is computed as the probability of locating the correct target based on the subjects' responses divided by the probability of locating the target for an optimal responder. The maximal optimality index is 1, for a Bayesian observer with correct internal model of the task and no sensorimotor noise. Error bars are SE across subjects. Priors are arranged in the order of differential entropy (i.e. increasing variance for Gaussian priors), except for ‘unimodal test’ priors which are listed in order of increasing width of the main peak in the prior (see text). The dotted line and dash-dotted line represent the optimality index of a suboptimal observer that takes into account respectively either only the cue or only the prior. The shaded area is the zone of synergistic integration, in which an observer performs better than using information from either the prior or the cue alone.

**Figure 6. Average reaction times as a function of the SD of the posterior distribution.**
Each panel shows the average reaction times (mean SE across subjects) for a given session as a function of the SD of the posterior distribution, (individual data were smoothed with a kernel regression estimate, see Methods). Dashed lines are robust linear fits to the reaction times data. For all sessions the slope of the linear regression is significantly different than zero ().

**Figure 7. Decision making with stochastic posterior distributions.**
**a–c**: Each panel shows an example of how different models of stochasticity in the representation of the posterior distribution, and therefore in the computation of the expected loss, may affect decision making in a trial. In all cases, the observer chooses the subjectively optimal target (blue arrow) that minimizes the expected loss (purple line; see Eq. 4) given his or her current representation of the posterior (black lines or bars). The original posterior distribution is showed in panels b–f for comparison (shaded line). a: Original posterior distribution. b: Noisy posterior: the original posterior is corrupted by random multiplicative or Poisson-like noise (in this example, the noise has caused the observer to aim for the wrong peak). c: Sample-based posterior: a discrete approximation of the posterior is built by drawing samples from the original posterior (grey bars; samples are binned for visualization purposes). **d–f**: Each panel shows how stochasticity in the posterior affects the distribution of target choices (blue line). d: Without noise, the target choice distribution is a delta function peaked on the minimum of the expected loss, as per standard BDT. e: On each trial, the posterior is corrupted by different instances of noise, inducing a distribution of possible target choices (blue line). In our task, this distribution of target choices is very well approximated by a power function of the posterior distribution, Eq. 7 (red dashed line); see Text S2 for details. f: Similarly, the target choice distribution induced by sampling (blue line) is fit very well by a power function of the posterior (red dashed line). Note the extremely close resemblance of panels e and f (the exponent of the power function is the same).

**Figure 8. Model comparison between individual models.**
a: Each column represents a subject, divided by test group (all datasets include a Gaussian training session), each row an observer model identified by a model string (see Table 2). Cell color indicates model's evidence, here displayed as the Bayes factor against the best model for that subject (a higher value means a worse performance of a given model with respect to the best model). Models are sorted by their posterior likelihood for a randomly selected subject (see panel b). Numbers above cells specify ranking for most supported models with comparable evidence (difference less than 10 in 2 log Bayes factor [32]). b: Probability that a given model generated the data of a randomly chosen subject. Here and in panel c, brown bars represent the most supported models (or model levels within a factor). Asterisks indicate a significant exceedance probability, that is the posterior probability that a given model (or model component) is more likely than any other model (or model component): . c: Probability that a given model level within a factor generated the data of a randomly chosen subject.

**Figure 9. Comparison between alternative models of decision making.**
We tested a class of alternative models of decision making which differ with respect to predictions for non-Gaussian trials only. a: Each column represents a subject, divided by group (either unimodal or bimodal test session), each row an observer model identified by a model string (see Table 2). Cell color indicates model's evidence, here displayed as the Bayes factor against the best model for that subject (a higher value means a worse performance of a given model with respect to the best model). Models are sorted by their posterior likelihood for a randomly selected subject (see panel b). Numbers above cells specify ranking for most supported models with comparable evidence (difference less than 10 in 2 log Bayes factor [32]). b: Probability that a given model generated the data of a randomly chosen subject. Here and in panel c, brown bars represent the most supported models (or model levels within a factor). Asterisks indicate a significant exceedance probability, that is the posterior probability that a given model (or model component) is more likely than any other model (or model component): , . c: Probability that a given model level within a factor generated the data of a randomly chosen subject. Label ‘GA’ stands for no Gaussian approximation (full posterior).

**Figure 10. Model ‘postdiction’ of the optimality index.**
Each bar represents the group-averaged optimality index for a specific session, for each prior (indexed from 1 to 8, see also Figure 2) and cue type, either low-noise cues (red bars) or high-noise cues (blue bars); see also Figure 5. Error bars are SE across subjects. The continuous line represents the ‘postdiction’ of the best suboptimal Bayesian observer model, model SPK-P-L; see ‘Analysis of best observer model’ in the text). For comparison, the dashed line is the ‘postdiction’ of the best suboptimal observer model that follows Bayesian Decision Theory, BDT-P-L.

**Figure 11. Comparison of measured and simulated performance.**
Comparison of the mean optimality index computed from the data and the simulated optimality index, according to the ‘postdiction’ of the best observer model (SPK-P-L). Each dot represents a single session for each subject (either training or test). The dashed line corresponds to equality between observed and simulated performance. Model-simulated performance is in good agreement with subjects' performance ().

**Figure 12. Reconstructed prior distributions.**
Each panel shows the (unnormalized) probability density for a ‘prior’ distribution of targets, grouped by test session, as per Figure 2. Purple lines are mean reconstructed priors (mean 1 s.d.) according to observer model SPK-L. **a: Gaussian session.** Recovered priors in the Gaussian test session are very good approximations of the true priors (comparison between SD of the reconstructed priors and true SD: ). **b: Unimodal session.** Recovered priors in the unimodal test session approximate the true priors (recovered SD: , true SD: screen units) although with systematic deviations in higher-order moments (comparison between moments of the reconstructed priors and true moments: skewness ; kurtosis ). Reconstructed priors are systematically less kurtotic (less peaked, lighter-tailed) than the true priors. **c: Bimodal session.** Recovered priors in the bimodal test session approximate the true priors with only minor systematic deviations (recovered SD: , true SD: screen units; coefficient of determination between moments of the reconstructed priors and true moments: skewness ; kurtosis ).

See this image and copyright information in PMC

References

1. Weiss Y, Simoncelli EP, Adelson EH (2002) Motion illusions as optimal percepts. Nat Neurosci 5: 598–604. - PubMed
1. Stocker AA, Simoncelli EP (2006) Noise characteristics and prior expectations in human visual speed perception. Nat Neurosci 9: 578–585. - PubMed
1. Girshick A, Landy M, Simoncelli E (2011) Cardinal rules: visual orientation perception reflects knowledge of environmental statistics. Nat Neurosci 14: 926–932. - PMC - PubMed
1. Chalk M, Seitz A, Seriès P (2010) Rapidly learned stimulus expectations alter perception of motion. J Vis 10: 1–18. - PubMed
1. Miyazaki M, Nozaki D, Nakajima Y (2005) Testing bayesian models of human coincidence timing. J Neurophysiol 94: 395–399. - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

On the origins of suboptimality in human probabilistic inference

Affiliations

On the origins of suboptimality in human probabilistic inference

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources