Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Oct 17;346(6207):340-3.
doi: 10.1126/science.1256254. Epub 2014 Sep 18.

Neural correlates of strategic reasoning during competitive games

Affiliations

Neural correlates of strategic reasoning during competitive games

Hyojung Seo et al. Science. .

Abstract

Although human and animal behaviors are largely shaped by reinforcement and punishment, choices in social settings are also influenced by information about the knowledge and experience of other decision-makers. During competitive games, monkeys increased their payoffs by systematically deviating from a simple heuristic learning algorithm and thereby countering the predictable exploitation by their computer opponent. Neurons in the dorsomedial prefrontal cortex (dmPFC) signaled the animal's recent choice and reward history that reflected the computer's exploitative strategy. The strength of switching signals in the dmPFC also correlated with the animal's tendency to deviate from the heuristic learning algorithm. Therefore, the dmPFC might provide control signals for overriding simple heuristic learning algorithms based on the inferred strategies of the opponent.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1. Behavioral task and performance
Biased matching pennies (A) and its payoff matrix (B). R, risky target; S, safe target. (C) Behavioral effects of gains and losses. Average regression coefficients (ordinate) quantified the tendency for the animal to choose the same target that produced a particular outcome in each of the last 10 trials. Arrows indicate the attenuation in the immediate effect of loss. Error bars, SEM.
Fig. 2
Fig. 2. Systematic deviations from reinforcement learning was beneficial
The color of each box in the decision trees (top) and the position of each circle in the scatter plots (bottom) indicate how much the probability of choosing the safe target deviated from the prediction of the best fitting reinforcement learning model (abscissa in the bottom scatter plot) according to the choices and outcomes in the last two trials, and how this increased or decreased the probability of token compared to the Nash-equilibrium strategy (ordinate in the scatter plot). Numbers indicate different sequences of choices and outcomes in the two preceding trials. Solid boxes correspond to the sequences included in the best hybrid reinforcement learning model (14). R−, and R+ denote loss and gain from the risky target, respectively, whereas S0 and S+ neutral outcome and gain from the safe target.
Fig. 3
Fig. 3. Cortical activity related to the conjunctions of choices and outcomes
(A) Fraction of neurons in each brain region that significantly modulated their activity during the delay period according to high-order conjunctions of choices and outcomes (14). (B) The time course of signals plotted in (A), using the same color code used to indicate different brain areas. (C) Spike density functions of an example dmPFC neuron sorted by the animal’s choices (R, risky; S, safe) and outcomes (+, 0, and − for gain, neutral and loss) as well as the positions of the chosen target in the current (t) and last (t−1) trials. Colored disks indicate different sequences of previous choices and outcomes, and asterisks indicate the activity re-plotted in Fig. 4.
Fig. 4
Fig. 4. Cortical signals for deviation from simple reinforcement learning
(A) Spike density functions from a dmPFC neuron (shown in Fig. 3C) sorted by the animal’s choices in the current and previous trials for 3 different sequences of outcomes in the last two trials (indicated by the text label and color defined in Fig. 3C). Δ denotes the difference in the accuracy of decoding the animal’s choice in switch vs. stay trials. (B) The difference in the decoding accuracy, ΔDA(switch), plotted as a function of how much more often the animal switched its choices compared to the prediction from the simple RL algorithm. (C) The same results shown in (B) for the entire population of dmPFC neurons (left) and averaged for each outcome sequence (identified by colors defined in Fig. 3C; right). Lines correspond to the best-fitting regression models. (D) The correlation coefficient between ΔDA and the deviation from reinforcement learning model for two different data sets (BMP, biased matching pennies; MP, matching pennies).

Similar articles

Cited by

References

    1. Hastie T, Tibshirani R, Friedman J. The Elements of Statistical Learning. Springer; New York: 2001.
    1. Gigerenzer G, Brighton H. Homo heuristicus: why biased minds make better inferences. Top Cogn Sci. 2009;1:107–143. - PubMed
    1. Sutton RS, Barto AG. Reinforcement Learning: An Introduction. MIT Press; Cambridge, MA: 1998.
    1. Ito M, Doya K. Multiple representations and algorithms for reinforcement learning in the cortico-basal ganglia circuit. Curr Opin Neurobiol. 2011;21:368–373. - PubMed
    1. Lee D, Seo H, Jung MW. Neural basis of reinforcement learning and decision making. Annu Rev Neurosci. 2012;35:287–308. - PMC - PubMed

Publication types

LinkOut - more resources