Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Aug;6(8):1153-1168.
doi: 10.1038/s41562-022-01357-z. Epub 2022 May 30.

Human inference reflects a normative balance of complexity and accuracy

Affiliations

Human inference reflects a normative balance of complexity and accuracy

Gaia Tavoni et al. Nat Hum Behav. 2022 Aug.

Abstract

We must often infer latent properties of the world from noisy and changing observations. Complex, probabilistic approaches to this challenge such as Bayesian inference are accurate but cognitively demanding, relying on extensive working memory and adaptive processing. Simple heuristics are easy to implement but may be less accurate. What is the appropriate balance between complexity and accuracy? Here we model a hierarchy of strategies of variable complexity and find a power law of diminishing returns: increasing complexity gives progressively smaller gains in accuracy. The rate of diminishing returns depends systematically on the statistical uncertainty in the world, such that complex strategies do not provide substantial benefits over simple ones when uncertainty is either too high or too low. In between, there is a complexity dividend. In two psychophysical experiments, we confirm specific model predictions about how working memory and adaptivity should be modulated by uncertainty.

PubMed Disclaimer

Conflict of interest statement

Competing Interests Statement

The authors declare no competing interests.

Figures

Figure 1.
Figure 1.. A hierarchy of cognitive functions maps to a hierarchy of inference strategies.
Two nested families of inference strategies of decreasing algorithmic complexity can be derived from the exact Bayesian approach by progressively reducing requirements of memory and adaptivity (see also Supplementary Fig. 1). We illustrate this approach in the context of inference from noisy observations (blue dots) of a latent variable μt (red dashed lines). See text for model descriptions and Methods for model details. The decrease in algorithmic complexity over this hierarchy of strategies mirrors a corresponding decrease in cognitive load (legend on the right-hand side).
Figure 2.
Figure 2.. Gaussian change-point processes.
Observations xt (blue dots) are generated from a source positioned at μt (dashed red line) with Gaussian noise (SD = σ). The source is hidden to the observer and undergoes change-points at random times with probability h (volatility). At the change-points, μt is resampled from a Gaussian distribution centered at μ¯ (dashed black line, stable over time) and with SD = σ0 = 1. Different panels show processes with different volatility (increasing from left to right) and noise R = σ/σ0 (increasing from bottom to top): (A): h = 0.06, R = 0.45; (B): h = 0.24, R = 0.45; (C): h = 0.06, R = 0.05; (D): h = 0.24, R = 0.05.
Figure 3.
Figure 3.. Adaptive models reduce to calibrated simpler strategies when variability is low or high.
(A): Computation of the Alignment. (Left) Two-dimensional parameter space of the Mixture models with two units defined by learning rates α1 and α2, and the embedded unidimensional space of the nested single-unit models (diagonal line α1 = α2). The optimal Mixture model and optimal single-unit model (black dots) are indicated along with the parameter deformation leading from one to the other (gray line). (Right) Relevant and irrelevant parameter deformations that maximally or minimally change the prediction error moving away from the optimal adaptive Mixture model. Alignment is defined as the normalized angle θ between the irrelevant deformation and the direction to the best non-adaptive single-unit model. The prediction error used to compute Alignment is estimated over 5000 time steps of the process for each h/R values. (B): Redundancy of the adaptive Mixture models (left: Mixture of two Sliding Windows; right: Mixture of two Delta Rules) for a range of volatility and noise values in a change-point detection task (Fig. 2). The same error function as in (A) is used to compute Redundancy. Slices through the red inset windows are shown to the left and right (red lines: 4th-order polynomial fits). (C): Alignment of the irrelevant parameter deformation towards the non-adaptive nested single-unit model, plotted as in B. (D): Probability distribution of Alignment values conditioned on Redundancy, sampled over tested volatility and noise values.
Figure 4.
Figure 4.. Diminishing returns from increasing complexity.
(A): Algorithmic complexity (Eq. 30) for models in Fig. 1. The exact Bayesian model has infinite complexity by our measure and is not shown. (B): Inaccuracy (Eq. 32) decreases as a power law in the complexity (Eq. 30), shown here for volatility and noise levels h = 0.1 and R = 1. Inset: linear fit on a log-log scale. See also Supplementary Fig. 3A–C for goodness-of-fit statistics. The exponent in the power law varies with (C) noise and (D) volatility. Inaccuracy is computed over ten 5000-time-long instances of the change point process. (E): Scaling of inaccuracy and accuracy (Eq. 34) with complexity for fixed volatility and varying noise (Eq. 33). Color code and scaling exponents for each condition taken from panel (C). Horizontal black lines indicate the threshold for performance within 10% of the Bayesian optimum. Intercept with the scaling curve for each task condition indicates the minimum model complexity required to reach the performance threshold. (F): Same as panel (E) for fixed noise and varying volatility. Color code and scaling exponents taken from panel (D).
Figure 5.
Figure 5.. Simple inference strategies are usually sufficient.
The color map shows the simplest strategy achieving performance within 10% of the Bayesian optimum (inaccuracy < 0.1) for each combination of volatility and noise in the prediction (A) and estimation (B) tasks. See also Supplementary Fig. 4.
Figure 6.
Figure 6.. Optimal cognitive engagement.
The colormaps show log10 𝒞opt (Eq. 4) as a function of volatility and noise, for the prediction (A) and the estimation (B) tasks; σr = 0.1. High cognitive engagement is optimal only at low volatility and intermediate noise.
Figure 7.
Figure 7.. Subjects switch between simple and complex strategies as predicted by the theory in the Gaussian estimation task.
(A): Map of the volatility/noise conditions probed in this experiment compared to conditions probed in previous experiments (see legend). Background colors indicate the simplest model with inaccuracy ℐ < 0.1 (i.e., the most efficient model for tolerance = 0.1) at each point of the volatility/noise plane for the estimation task performed by the subjects. (B) and (C): Mean normalized adaptivity ± SEM (small error bars) for the theoretical most-efficient model (B) and 82 human subjects (C) performing the estimation task for each of the three noise (R)/volatility (h) conditions. Normalized adaptivity values were computed by fitting multiple linear regression models to data from 360 trials per subject and volatility/noise condition (details in Method). SEM were obtained by propagating the errors on the integration time scales estimated from the linear regressions (Methods). For the colored bars, the most-efficient model was defined as the simplest model with ℐ < 0.1; dashed gray lines represent the range of values obtained using different tolerances (0.02 – 0.2; note the broad range for high noise). Thin error bars in C represent the standard deviation of the normalized adaptivity across subjects. Both the theory and data showed peak adaptivity at intermediate noise (left, one-tailed t-test, p = 7 · 10−12 for the intermediate vs. low and p = 0.0038 for the intermediate vs. high noise comparisons) and low volatility (right, p = 10−6 for the low vs. intermediate and p = 10−28 for the low vs. high comparisons). (D) and (E): Mean normalized working-memory load from theory (D) and 82 human subjects (E) performing the estimation task for each of the three noise/volatility conditions (plotted as in (B) and (C)). For both the theory and the data, the working-memory load is smaller at low noise (one-tailed t-test, p = 4·10−12 for both low vs. intermediate and low vs. high noise comparisons) and decreases with increasing volatility (one-tailed t-test, p = 5 · 10−13 for low vs. intermediate, p = 2·10−24 for low vs. high, p = 3·10−5 for intermediate vs. high volatility comparisons). (F) and (G): Probabilities that each of the eight models of Fig. 1 (color code as in (A)) generated the data of a randomly chosen subject, in each noise (F) and volatility (G) condition. Bars indicate results for the best-performing subjects (with inaccuracy ℐ < 75th percentile across all tested conditions); dotted lines represent the values obtained for all subjects.
Figure 8.
Figure 8.. Subjects switch between simple and complex strategies as indicated by the theory in the Bernoulli prediction task.
(A): Theoretical predictions for this task. Three models of decreasing complexity are considered: the Bayesian model, the constant Prior, and the Leaky-Accumulator model. The colormap indicates the simplest model with inaccuracy ℐ < 0.1 (i.e., the most efficient model for tolerance = 0.1) at each point of the volatility/noise plane. Three conditions of increasing noise (R = 0.04, R = 0.28, R = 0.86, red circles) and two conditions of increasing volatility (h = 0.05, h = 0.4, red diamonds) were tested in the experiment. (B) and (C): Probabilities that each of the three models (color code as in (A)) minimized the sum of squared residuals of a randomly chosen subject, in each noise (B) and volatility (C) condition. Data from 53 subjects, 300 trials per subject and condition.

References

    1. Rao RPN, Bayesian computation in recurrent neural circuits, Neural Computation 16 (1) (2004) 1–38. - PubMed
    1. Bogacz R, Brown E, Moehlis J, Holmes P, Cohen JD, The physics of optimal decision making: a formal analysis of models of performance in two-alternative forced-choice tasks, Psychological Review 113 (4) (2006) 700. - PubMed
    1. Fearnhead P, Liu Z, On-line inference for multiple changepoint problems, Journal of the Royal Statistical Society: Series B (Statistical Methodology) 69 (4) (2007) 589–605.
    1. Shi L, Griffiths TL, Neural implementation of hierarchical Bayesian inference by importance sampling, in: Advances in Neural Information Processing Systems, 2009, pp. 1669–1677.
    1. Brown SD, Steyvers M, Detecting and predicting changes, Cognitive Psychology 58 (1) (2009) 49–67. - PubMed

Publication types