On the origins of suboptimality in human probabilistic inference
- PMID: 24945142
- PMCID: PMC4063671
- DOI: 10.1371/journal.pcbi.1003661
On the origins of suboptimality in human probabilistic inference
Abstract
Humans have been shown to combine noisy sensory information with previous experience (priors), in qualitative and sometimes quantitative agreement with the statistically-optimal predictions of Bayesian integration. However, when the prior distribution becomes more complex than a simple Gaussian, such as skewed or bimodal, training takes much longer and performance appears suboptimal. It is unclear whether such suboptimality arises from an imprecise internal representation of the complex prior, or from additional constraints in performing probabilistic computations on complex distributions, even when accurately represented. Here we probe the sources of suboptimality in probabilistic inference using a novel estimation task in which subjects are exposed to an explicitly provided distribution, thereby removing the need to remember the prior. Subjects had to estimate the location of a target given a noisy cue and a visual representation of the prior probability density over locations, which changed on each trial. Different classes of priors were examined (Gaussian, unimodal, bimodal). Subjects' performance was in qualitative agreement with the predictions of Bayesian Decision Theory although generally suboptimal. The degree of suboptimality was modulated by statistical features of the priors but was largely independent of the class of the prior and level of noise in the cue, suggesting that suboptimality in dealing with complex statistical features, such as bimodality, may be due to a problem of acquiring the priors rather than computing with them. We performed a factorial model comparison across a large set of Bayesian observer models to identify additional sources of noise and suboptimality. Our analysis rejects several models of stochastic behavior, including probability matching and sample-averaging strategies. Instead we show that subjects' response variability was mainly driven by a combination of a noisy estimation of the parameters of the priors, and by variability in the decision process, which we represent as a noisy or stochastic posterior.
Conflict of interest statement
The authors have declared that no competing interests exist.
Figures
was drawn from a discrete representation of the trial-dependent prior
, whose shape was chosen randomly from a session-dependent class of distributions. The vertical distance of the cue from the target line,
, was either ‘short’ or ‘long’, with equal probability. The horizontal position of the cue,
, depended on
and
. The participants had to infer
given
,
and the current prior
. d: Details of the generative model. The potential targets constituted a discrete representation of the trial-dependent prior distribution
; the discrete representation was built by taking equally spaced samples from the inverse of the cdf of the prior,
. The true target (red dot) was chosen uniformly at random from the potential targets, and the horizontal position of the cue (yellow dot) was drawn from a Gaussian distribution,
, centered on the true target
and whose SD was proportional to the distance
from the target line (either ‘short’ or ‘long’, depending on the trial, for respectively low-noise and high-noise cues). Here we show the location of the cue for a high-noise trial. e: Components of Bayesian decision making. According to Bayesian Decision Theory, a Bayesian ideal observer combines the prior distribution with the likelihood function to obtain a posterior distribution. The posterior is then convolved with the loss function (in this case whether the target will be encircled by the cursor) and the observer picks the ‘optimal’ target location
(purple dot) that corresponds to the minimum of the expected loss (dashed line).
to
). a: Gaussian priors. These priors were used for the training session, common to all subjects, and in the Gaussian test session. Standard deviations cover the range
to
screen units in equal increments. b: Unimodal priors. All unimodal priors have fixed SD
screen units but different skewness and kurtosis (see Methods for details). c: Bimodal priors. All priors in the bimodal session have fixed SD
screen units but different relative weights and separation between the peaks (see Methods).
as a function of the SD of the Gaussian prior distribution,
, plotted respectively for trials with low noise (‘short’ cues, red line) and high noise (‘long’ cues, blue line). The response slope is equivalent to the linear weight assigned to the position of the cue (Eq. 1). Dashed lines represent the Bayes optimal strategy given the generative model of the task in the two noise conditions. Top: Slopes for a representative subject in the training session (slope
SE). Bottom: Average slopes across all subjects in the training session (
, mean
SE across subjects).
SE across subjects) for a given session as a function of the SD of the posterior distribution,
(individual data were smoothed with a kernel regression estimate, see Methods). Dashed lines are robust linear fits to the reaction times data. For all sessions the slope of the linear regression is significantly different than zero (
).
(blue arrow) that minimizes the expected loss (purple line; see Eq. 4) given his or her current representation of the posterior (black lines or bars). The original posterior distribution is showed in panels b–f for comparison (shaded line). a: Original posterior distribution. b: Noisy posterior: the original posterior is corrupted by random multiplicative or Poisson-like noise (in this example, the noise has caused the observer to aim for the wrong peak). c: Sample-based posterior: a discrete approximation of the posterior is built by drawing samples from the original posterior (grey bars; samples are binned for visualization purposes). d–f: Each panel shows how stochasticity in the posterior affects the distribution of target choices
(blue line). d: Without noise, the target choice distribution is a delta function peaked on the minimum of the expected loss, as per standard BDT. e: On each trial, the posterior is corrupted by different instances of noise, inducing a distribution of possible target choices
(blue line). In our task, this distribution of target choices is very well approximated by a power function of the posterior distribution, Eq. 7 (red dashed line); see Text S2 for details. f: Similarly, the target choice distribution induced by sampling (blue line) is fit very well by a power function of the posterior (red dashed line). Note the extremely close resemblance of panels e and f (the exponent of the power function is the same).
. c: Probability that a given model level within a factor generated the data of a randomly chosen subject.
,
. c: Probability that a given model level within a factor generated the data of a randomly chosen subject. Label ‘
GA’ stands for no Gaussian approximation (full posterior).
).
1 s.d.) according to observer model SPK-L. a: Gaussian session. Recovered priors in the Gaussian test session are very good approximations of the true priors (comparison between SD of the reconstructed priors and true SD:
). b: Unimodal session. Recovered priors in the unimodal test session approximate the true priors (recovered SD:
, true SD:
screen units) although with systematic deviations in higher-order moments (comparison between moments of the reconstructed priors and true moments: skewness
; kurtosis
). Reconstructed priors are systematically less kurtotic (less peaked, lighter-tailed) than the true priors. c: Bimodal session. Recovered priors in the bimodal test session approximate the true priors with only minor systematic deviations (recovered SD:
, true SD:
screen units; coefficient of determination between moments of the reconstructed priors and true moments: skewness
; kurtosis
).References
-
- Weiss Y, Simoncelli EP, Adelson EH (2002) Motion illusions as optimal percepts. Nat Neurosci 5: 598–604. - PubMed
-
- Stocker AA, Simoncelli EP (2006) Noise characteristics and prior expectations in human visual speed perception. Nat Neurosci 9: 578–585. - PubMed
-
- Chalk M, Seitz A, Seriès P (2010) Rapidly learned stimulus expectations alter perception of motion. J Vis 10: 1–18. - PubMed
-
- Miyazaki M, Nozaki D, Nakajima Y (2005) Testing bayesian models of human coincidence timing. J Neurophysiol 94: 395–399. - PubMed
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
