Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Nov 24;117(47):29311-29320.
doi: 10.1073/pnas.1912336117.

Rational thoughts in neural codes

Affiliations

Rational thoughts in neural codes

Zhengwei Wu et al. Proc Natl Acad Sci U S A. .

Abstract

Complex behaviors are often driven by an internal model, which integrates sensory information over time and facilitates long-term planning to reach subjective goals. A fundamental challenge in neuroscience is, How can we use behavior and neural activity to understand this internal model and its dynamic latent variables? Here we interpret behavioral data by assuming an agent behaves rationally—that is, it takes actions that optimize its subjective reward according to its understanding of the task and its relevant causal variables. We apply a method, inverse rational control (IRC), to learn an agent’s internal model and reward function by maximizing the likelihood of its measured sensory observations and actions. This thereby extracts rational and interpretable thoughts of the agent from its behavior. We also provide a framework for interpreting encoding, recoding, and decoding of neural data in light of this rational model for behavior. When applied to behavioral and neural data from simulated agents performing suboptimally on a naturalistic foraging task, this method successfully recovers their internal model and reward function, as well as the Markovian computational dynamics within the neural manifold that represent the task. This work lays a foundation for discovering how the brain represents and computes with dynamic latent variables.

Keywords: cognition; computation; neural coding; neuroscience; rational.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interest.

Figures

Fig. 1.
Fig. 1.
(A and B) Graphical model of a POMDP (A) and the IRC problem (B). Open circles denote latent variables, and solid circles denote observable variables. For the POMDP, the agent knows its beliefs but must infer the world state. For IRC, the scientist knows the world state but must infer the beliefs. The real-world dynamics depend on parameters ϕ, while the belief dynamics and actions of the agent depend on parameters θ, which include both its assumptions about the stochastic world dynamics and observations and its own subjective rewards and costs.
Fig. 2.
Fig. 2.
Illustration of foraging task with latent dynamics and partially observable sensory data. The reward availability in each of the two boxes evolves according to a telegraph process, switching between available (red) and unavailable (blue), and colors give the animal an ambiguous sensory cue about the reward availability. The agent may travel between the locations of the two boxes. When a button is pushed to open a box, the agent receives any available reward.
Fig. 3.
Fig. 3.
Successful recovery of agent model by inverse rational control. The agent was a neural network trained to imitate a suboptimal but rational teacher and tested on a novel task. (A) The estimated parameters converge to the optimal point of the observed data log-likelihood (white star). Since the parameter space is high-dimensional, we project it onto the first two principal components u,v of the learning trajectory for θ (blue). The estimated parameters differ slightly from the teacher’s parameters (green dot) due to data limitations. (B) Comparison of the teacher’s parameters and the estimated parameters. Error bars show 95% confidence intervals (CI) based on the Hessian of log-likelihood (SI Appendix, Fig. S2). (C) Estimated and the teacher’s true marginal belief dynamics over latent reward availability. These estimates are informed by the noisy color data at each box and the times and locations of the agent’s actions. The posteriors over beliefs are consistent with the dynamics of the teacher’s beliefs (blue line). (D) Teacher’s beliefs versus IRC belief posteriors averaged over all times when the teacher had the same beliefs, p¯=p(b^t|a1:T,o1:T)|bt. These mean posteriors p¯ concentrate around the true beliefs of the teacher. (E–H) Inferred distributions of (E) actions, (F) residence times, (G) intervals between consecutive button-presses, and (H) intervals between movements.
Fig. 4.
Fig. 4.
Schematic for analyzing a dynamic neural code. (A) Graphical model of a POMDP problem with a solution implemented by neurons implicitly encoding beliefs. (B) We find how behaviorally relevant variables (here, beliefs) are encoded in measured neural activity through the function bˇt=φenc(rt). (C) We then test our hypothesis that the brain recodes its beliefs rationally by testing whether the dynamics of the behaviorally estimated belief b^ match the dynamics of the neurally estimated beliefs bˇ, as expressed through the update dynamics f^dyn(b^t,ot) and recoding function fˇrec(bˇt,ot). (D) Similarly, we test whether the brain decodes its beliefs rationally by comparing the behaviorally and neurally derived policies π^act and πˇdec. Quantities estimated from behavior or from neurons are denoted by up-pointing or down-pointing hats, ^ and ˇ, respectively (SI Appendix, Table S1).
Fig. 5.
Fig. 5.
Conceptual illustration of encoding and recoding. (A) Neural responses r inhabit a manifold (blue volume, here three-dimensional) embedded in the high-dimensional space of all possible neural responses. A neural encoding model divides this manifold into task-relevant and -irrelevant coordinates (blue and purple axes). We must estimate these coordinates from training data, given some inferred task-relevant targets b. According to this encoding, many activity patterns r can correspond to the same vector of task variables b. Any particular neural trajectory (white curve) is just one of many that would trace out the same task-relevant projection b(t) (black curves). The set of all neural activities consistent with one task-relevant trajectory therefore spans a manifold (gray ribbon). (B) After fitting an estimator of the task variables using training data, we can measure how well the encoding describes the task variables in a new testing dataset. Different encodings (red and green volumes) divide the same neural manifold differently into relevant and irrelevant coordinates, and the task variables bˇ estimated from these neural encodings (red and green curves) will deviate in different ways from the variables b^ inferred from behavior (black). (C) The testing error of these neurally derived task variables (red, green) will be larger than the training error (blue). Task-relevant variables bˇ derived from different encoding models may have the same total errors, but may nonetheless have different recoding dynamics. Here the smoother green dynamics are closer to the behaviorally inferred dynamics than the rougher red dynamics, which implies that these task-relevant dimensions better capture the computations implied by inverse rational control. SI Appendix, Fig. S3 provides more detail of good and bad recodings.
Fig. 6.
Fig. 6.
Analysis of neural coding of rational thoughts. (A) Encoding: Neurally derived beliefs bˇ match behaviorally derived beliefs b^ based on IRC. Cross-validated neural beliefs are estimated from testing neural responses r using a linear estimator, bˇ=Wr+c, with the weight matrix fitted from separate training data. (B) Recoding: Belief updates Δbˇt from the neural recoding function match the corresponding belief updates from the task dynamics Δb^t. Neural updates are estimated using nonlinear regression with radial basis functions (Materials and Methods). (C) Decoding: The policy πˇdec predicted by decoding neural beliefs approximately matches the policy π^act estimated from behavior by IRC. Neural policy is estimated from actions a and neural beliefs bˇ using nonlinear multinomial regression (Materials and Methods).

References

    1. Mante V., Sussillo D., Shenoy K. V., Newsome W. T., Context-dependent computation by recurrent dynamics in prefrontal cortex. Nature 503, 78–84 (2013). - PMC - PubMed
    1. Kriegeskorte N., Deep neural networks: A new framework for modeling biological vision and brain information processing. Annu. Rev. Vision Sci. 1, 417–446 (2015). - PubMed
    1. Yamins D. L. K., et al. , Performance-optimized hierarchical models predict neural responses in higher visual cortex. Proc. Natl. Acad. Sci. U.S.A. 111, 8619–8624 (2014). - PMC - PubMed
    1. Gao Y., Archer E. W., Paninski L., Cunningham J. P., “Linear dynamical neural population models through nonlinear embeddings” in NeurIPS, Lee D. D., Sugiyama M., Luxburg U. V., Guyon I., Garnett R., Eds. (Curran Associates, Inc., 2016), pp. 163–171.
    1. Chaudhuri R., Gercek B., Pandey B., Peyrache A., Fiete I., The population dynamics of a canonical cognitive circuit. bioRxiv:516021 (9 January 2019). - PubMed

Publication types

LinkOut - more resources