Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Jan 27;36(4):1211-22.
doi: 10.1523/JNEUROSCI.1901-15.2016.

Variability in Dopamine Genes Dissociates Model-Based and Model-Free Reinforcement Learning

Affiliations

Variability in Dopamine Genes Dissociates Model-Based and Model-Free Reinforcement Learning

Bradley B Doll et al. J Neurosci. .

Abstract

Considerable evidence suggests that multiple learning systems can drive behavior. Choice can proceed reflexively from previous actions and their associated outcomes, as captured by "model-free" learning algorithms, or flexibly from prospective consideration of outcomes that might occur, as captured by "model-based" learning algorithms. However, differential contributions of dopamine to these systems are poorly understood. Dopamine is widely thought to support model-free learning by modulating plasticity in striatum. Model-based learning may also be affected by these striatal effects, or by other dopaminergic effects elsewhere, notably on prefrontal working memory function. Indeed, prominent demonstrations linking striatal dopamine to putatively model-free learning did not rule out model-based effects, whereas other studies have reported dopaminergic modulation of verifiably model-based learning, but without distinguishing a prefrontal versus striatal locus. To clarify the relationships between dopamine, neural systems, and learning strategies, we combine a genetic association approach in humans with two well-studied reinforcement learning tasks: one isolating model-based from model-free behavior and the other sensitive to key aspects of striatal plasticity. Prefrontal function was indexed by a polymorphism in the COMT gene, differences of which reflect dopamine levels in the prefrontal cortex. This polymorphism has been associated with differences in prefrontal activity and working memory. Striatal function was indexed by a gene coding for DARPP-32, which is densely expressed in the striatum where it is necessary for synaptic plasticity. We found evidence for our hypothesis that variations in prefrontal dopamine relate to model-based learning, whereas variations in striatal dopamine function relate to model-free learning.

Significance statement: Decisions can stem reflexively from their previously associated outcomes or flexibly from deliberative consideration of potential choice outcomes. Research implicates a dopamine-dependent striatal learning mechanism in the former type of choice. Although recent work has indicated that dopamine is also involved in flexible, goal-directed decision-making, it remains unclear whether it also contributes via striatum or via the dopamine-dependent working memory function of prefrontal cortex. We examined genetic indices of dopamine function in these regions and their relation to the two choice strategies. We found that striatal dopamine function related most clearly to the reflexive strategy, as previously shown, and that prefrontal dopamine related most clearly to the flexible strategy. These findings suggest that dissociable brain regions support dissociable choice strategies.

Keywords: decision-making; dopamine; genetics; reinforcement learning.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
A, Two-step sequential learning task structure. Each of the 300 trials starts with a selection between two first-stage options (green boxes), which produced a set of second-stage options (pink or blue boxes). First-stage options predominantly lead to one set of second-stage options (70% common transitions) but sometimes lead to the other set (30% rare transitions). B, Second-stage choices are rewarded with a randomly diffusing probability for each option. Diffusion ends at trial 150, and probabilities remain fixed for the remainder of trials (70%/30% for one state, 60%/40% for the other). C, Probabilistic selection task transfer phase follows immediately after sequential task. All pairs of second-stage options are presented, and subjects are instructed to choose the stimulus with the highest chance of reward without the aid of feedback. Novel pairs of the highest reward probability option (choose 70%) and the lowest (avoid 30%) assess learning from positive and negative outcomes, respectively.
Figure 2.
Figure 2.
Model predictions and human subject data. Tendency to stay with (or switch from) the first-stage choice made on the previous trial plotted as a function of the experienced reward (Rew = reward, No Rew = no reward) and transition type on the previous trial. A, The model-free strategy predicts increased stay behavior following rewards, regardless of transition type. B, The model-based strategy prospectively considers reward and transition probability, thus predicting that rare transitions should affect the value first-stage action that was not chosen in the previous trial (producing an interaction between reward and transition type). C, The hybrid model captures hallmarks of both strategies. Model predictions derived from simulations using the mean of the best fit parameters from human subject data. However, the predictions hold over arbitrary parameter settings. D, Human data (Caucasian subset) mirror the hybrid model, showing evidence of both model-based and model-free strategies. Error bars indicate SEM.
Figure 3.
Figure 3.
A, Logistic regression coefficients reflecting the degree to which subjects' choices were model-based. Bars represent the interaction of previous reward (Rew = reward, No Rew = no reward) and previous transition type estimated for each genotype in Caucasian subset. Top, Model-based choice increases with COMT Met alleles (linear effect: p = 0.01). Bottom, Negative relationship of model-based choice with DARPP-32 T alleles was not significant (linear effect: p = 0.1). B, Sequential task mean choice proportions by genotype for Caucasian subset (top row: COMT; bottom row: DARPP-32). Error bars indicate SEM.
Figure 4.
Figure 4.
Model-based choice weight parameter βMB estimates from computational model plotted by genotype (Caucasian subset). A, Parameter βMB increases with COMT Met alleles (linear effect: p = 0.03), which are putatively associated with increased DA levels in PFC. B, Parameter βMB decreases with DARPP-32 T alleles (linear effect: p = 0.03), which are putatively associated with enhanced striatal DA-mediated learning from positive relative to negative outcomes. Error bars indicate SEM.
Figure 5.
Figure 5.
Probabilistic selection transfer phase accuracy by genotype for Caucasian subset. Differential accuracy in choosing the most highly rewarding (Choose 70%) versus avoiding the least rewarding (Avoid 30%) stimulus is hypothesized to reflect differential DA-mediated plasticity in the direct and indirect pathways (via opposing effects on D1- and D2-expressing striatal cells, respectively), learned by model-free RL during the sequential task. A, DARPP-32 T alleles are associated with learning from positive relative to negative outcomes (linear effect: p = 0.001). Error bars indicate SEM. B, COMT genotype shows no relationship with accuracy in learning from positive and negative outcomes (linear effect: p = 0.7).

References

    1. Acquas E, Carboni E, de Ree RH, Da Prada M, Di Chiara G. Extracellular concentrations of dopamine and metabolites in the rat caudate after oral administration of a novel catechol-O-methyltransferase inhibitor Ro 40–7592. J Neurochem. 1992;59:326–330. doi: 10.1111/j.1471-4159.1992.tb08907.x. - DOI - PubMed
    1. Barr DJ, Levy R, Scheepers C, Tily HJ. Random effects structure for confirmatory hypothesis testing: keep it maximal. J Mem Lang. 2013;68:3. doi: 10.1016/j.jml.2012.11.001. - DOI - PMC - PubMed
    1. Bateup HS, Svenningsson P, Kuroiwa M, Gong S, Nishi A, Heintz N, Greengard P. Cell type-specific regulation of DARPP-32 phosphorylation by psychostimulant and antipsychotic drugs. Nat Neurosci. 2008;11:932–939. doi: 10.1038/nn.2153. - DOI - PMC - PubMed
    1. Berger B, Febvret A, Greengard P, Goldman-Rakic PS. DARPP-32, a phosphoprotein enriched in dopaminoceptive neurons bearing dopamine D1 receptors: distribution in the cerebral cortex of the newborn and adult rhesus monkey. J Comp Neurol. 1990;299:327–348. doi: 10.1002/cne.902990306. - DOI - PubMed
    1. Calabresi P, Gubellini P, Centonze D, Picconi B, Bernardi G, Chergui K, Svenningsson P, Fienberg AA, Greengard P. Dopamine and cAMP-regulated phosphoprotein 32 kDa controls both striatal long-term depression and long-term potentiation, opposing forms of synaptic plasticity. J Neurosci. 2000;20:8443–8451. - PMC - PubMed

Publication types

Substances

LinkOut - more resources