Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Jun 1;85(11):936-945.
doi: 10.1016/j.biopsych.2018.12.017. Epub 2019 Jan 4.

Model-Free and Model-Based Influences in Addiction-Related Behaviors

Affiliations

Model-Free and Model-Based Influences in Addiction-Related Behaviors

Stephanie M Groman et al. Biol Psychiatry. .

Abstract

Background: Disruptions in the decision-making processes that guide action selection are a core feature of many psychiatric disorders, including addiction. Decision making is influenced by the goal-directed and habitual systems that can be computationally characterized using model-based and model-free reinforcement learning algorithms, respectively. Recent evidence suggests an imbalance in the influence of these reinforcement learning systems on behavior in individuals with substance dependence, but it is unknown whether these disruptions are a manifestation of chronic drug use and/or are a preexisting risk factor for addiction.

Methods: We trained adult male rats on a multistage decision-making task to quantify model-free and model-based processes before and after self-administration of methamphetamine or saline.

Results: Individual differences in model-free, but not model-based, learning prior to any drug use predicted subsequent methamphetamine self-administration; rats with lower model-free behavior took more methamphetamine than rats with higher model-free behavior. This relationship was selective to model-free updating following a rewarded, but not unrewarded, choice. Both model-free and model-based learning were reduced in rats following methamphetamine self-administration, which was due to a decrement in the ability of rats to use unrewarded outcomes appropriately. Moreover, the magnitude of drug-induced disruptions in model-free learning was not correlated with disruptions in model-based behavior, indicating that drug self-administration independently altered both reinforcement learning strategies.

Conclusions: These findings provide direct evidence that model-free and model-based learning mechanisms are involved in select aspects of addiction vulnerability and pathology, and they provide a unique behavioral platform for conducting systems-level analyses of decision making in preclinical models of mental illness.

Keywords: Computational psychiatry; Dopamine; Drug addiction; Methamphetamine; Model-based reinforcement learning; Model-free reinforcement learning.

PubMed Disclaimer

Conflict of interest statement

Declaration of interests: The authors report no biomedical financial interests or potential conflicts of interest.

Figures

Figure 1:
Figure 1:. Decision-making in the rodent multi-stage decision-making (MSDM) task.
(A) Decision-making was assessed on a probabilistic MSDM task that paralleled the structure of the human MSDM task (7). (B) The probability of staying with the same first stage choice based on the previous trial outcome (rewarded vs. unrewarded) and the state transition (common transition: open bars; rare transition: grey bars) in theoretical data for a pure model-free agent (left), a pure model-based agent (middle) or an agent using a mix of each strategy (right) in the probabilistic MSDM task. (C) The probability of staying with the same first stage choice based on the previous trial outcome (rewarded vs. unrewarded) and the state transition (common transition: blue bars; rare transition: red bars) in the probabilistic MSDM task. (D) The regression weights for the logistic regression model analyzing choice behavior in the probabilistic MSDM. The weight of the outcome predictor (orange bar) represents the strength of model-free learning, while the transition-by-outcome interaction predictor (purple bar) represents the strength of model-based learning. (E) Diagram of the experimental design and the number of days rats spent in each experimental phase presented below.
Figure 2:
Figure 2:. Model-free behavior predicts addiction-relevant behaviors.
(A) The number of methamphetamine infusions earned in each 6 h self-administration session across the 14 days for individual rats (red lines) and the average of all rats (black dashed line). *** p<0.001 compared to the first day of self-administration. See also Figure S3. (B) The number of saline infusions earned in each 6 h self-administration across the 14 days for individual rats (gray lines) and the average of all rats (black dashed line). See also Figure S3. (C) Methamphetamine self-administration data was fit with a power function using maximum likelihood. Drug self-administration data for three individual rats is plotted in the red lines. The number of drug infusions predicted by the fitted power function for these three rats is represented by the black lines. See also Figure S4. (D) The number of drug infusions taken across the self-administration sessions in rats with similar rates of escalation in drug use (e.g., B parameter), but with low (red; N=8) or high (pink; N=8) values for the initial strength of drug reinforcement (e.g., A parameter). See also Figure S4. (E) The number of drug infusions taken across the self-administration sessions in rats with similar values for the initial strength of drug reinforcement (e.g., A parameter), but low (dark blue; N=8) or high (light blue; N=8) rates of escalation in drug use (e.g., B parameter). See also Figure S4. (F) The relationship between model-free behavior (outcome regression coefficient) and the initial strength of drug reinforcement (A parameter). (G) The relationship between model-free behavior (outcome regression coefficient) and the rate of escalation in drug use (B parameter). (H) The relationship between model-based behavior (transition-by-outcome regression coefficient) and the initial strength of drug reinforcement (A parameter). (I) The relationship between model-based behavior (transition-by-outcome regression coefficient) and the rate of escalation in drug use (B parameter). (J) The regression coefficients from the simple logistic regression indexing the influence of the rewarded outcomes on current choice in rats with low (dark orange) or high (light orange) model-free behavior. ** p<0.01. (K) The regression coefficients from the simple logistic regression indexing the influence of the unrewarded outcomes on current choice in rats with low (dark gray) or high (light gray) model-free behavior.
Figure 3:
Figure 3:. Methamphetamine-induced disruptions in the deterministic MSDM task
(A) Left panel: The probability of choosing the first stage option associated with the highest reinforced stage two option (p(correct I stage 1)) and the probability of choosing the highest reinforced stage two option (p(correct I stage 2)) in the control/saline rats before (open bars) and after (closed bars) self-administration. Right panel: The probability of choosing the same first stage choice based on the previous trial outcome (i.e., rewarded vs. unrewarded). Below: Scatter plots comparing these dependent measures before and after the self-administration sessions for individual rats are presented below each bar graph with the mean value represented by the blue symbol. (B) Left panel: The probability of choosing the first stage option associated with the highest reinforced stage two option (p(correct I stage 1)) and the probability of choosing the highest reinforced stage two option (p(correct I stage 2)) in rats before (open bars) and after (closed bars) methamphetamine self-administration. Right panel: The probability of choosing the same first stage choice based on the previous trial outcome (i.e., rewarded vs. unrewarded). Below: Scatter plots comparing these dependent measures before and after the self-administration sessions for individual rats are presented below each bar graph with the mean value represented by the red symbol. ** p<0.01. (C) The regression coefficient derived from the logistic regression models in control/saline rats before (open bars) and after (closed bars) the self-administration sessions. Left: The regression coefficients from the logistic regression model examining the influence of previous trial outcome on current choice. Right: The regression coefficients from the simple logistic regression model examining the independent influence of rewarded and unrewarded outcomes on current choice. Below: Scatter plots comparing these dependent measures before and after the self-administration sessions for individual rats are presented below each bar graph with the mean value represented by the blue symbol. *** p<0.001. (D) The regression coefficient derived from the logistic regression models in methamphetamine rats before (open bars) and after (closed bars) the selfadministration sessions. Left: The regression coefficients from the logistic regression model examining the influence of the previous trial outcome on current choice. Right: The regression coefficients from the simple logistic regression model examining the independent influence of rewarded and unrewarded outcomes on current choice. Below: Scatter plots comparing these dependent measures before and after the self-administration sessions for individual rats are presented below each bar graph with the mean value represented by the red symbol. *** p<0.001.
Figure 4:
Figure 4:. Methamphetamine-induced disruptions in model-free and model-based behavior.
(A) Regression coefficients from the logistic regression model examining the influence of previous trial events on the likelihood of persisting with the same first stage choice in the probabilistic MSDM before (open bars) and after (closed bars) selfadministration in control/saline rats. The relationship between each regression coefficient before and after self-administration for individual rats are presented below the bar graphs with the mean value represented by the blue symbol. See also Figure S6. (B) Regression coefficients from the logistic regression model examining the influence of previous trial events on the likelihood of persisting with the same first stage choice in the probabilistic MSDM before (open bars) and after (closed bars) self-administration in methamphetamine rats. The relationship between each regression coefficient before and after self-administration for individual rats are presented below the bar graphs with the mean value represented by the red symbol. *** p<0.001. See also Figure S6. (C) The βMB and βMF estimates obtained from the hybrid RL model characterizing choices in the probabilistic MSDM before (open bars) and after (closed bars) selfadministration in control/saline rats. The relationship between each of these parameters before and after self-administration for individual rats is presented to the right of the bar graphs with the mean value represented by the blue symbol. See also Figure S6. (D) The βMB and βMF estimates obtained from the hybrid RL model characterizing choices in the probabilistic MSDM before (open bars) and after (closed bars) selfadministration in methamphetamine rats. The relationship between each of these parameters before and after self-administration for individual rats is presented to the right of the bar graphs with the mean value represented by the red symbol. ** p<0.01; # p=0.08. See also Figure S6. (E) The relationship between the change (before - after self-administration) in the model-free regression coefficient (e.g., outcome) and the change in the model-based regression coefficient (e.g., transition-by-outcome) in control/saline rats. See also Figure S7. (F) The relationship between the change (before - after self-administration) in the model-free regression coefficient (e.g., outcome) and the change in the model-based regression coefficient (e.g., transition-by-outcome) in methamphetamine rats. See also Figure S7.

Comment in

References

    1. Jentsch JD, Olausson P, De La Garza R 2nd, Taylor JR (2002): Impairments of reversal learning and response perseveration after repeated, intermittent cocaine administrations to monkeys. Neuropsychopharmacology, 2002/01/16 26: 183–190. - PubMed
    1. Stalnaker TA, Takahashi Y, Roesch MR, Schoenbaum G (2009): Neural substrates of cognitive inflexibility after chronic cocaine exposure. Neuropharmacology. 56: 63–72. - PMC - PubMed
    1. Ersche KD, Roiser JP, Robbins TW, Sahakian BJ (2008): Chronic cocaine but not chronic amphetamine use is associated with perseverative responding in humans. Psychopharmacol, 2008/01/25 197: 421–431. - PMC - PubMed
    1. Dayan P (2009): Dopamine, Reinforcement Learning, and Addiction. Pharmacopsychiatry. 42: S56–S65. - PubMed
    1. Jentsch JD, Taylor JR (1999): Impulsivity resulting from frontostriatal dysfunction in drug abuse: implications for the control of behavior by reward-related stimuli. Psychopharmacol, 1999/11/07 146: 373–390. - PubMed

Publication types

Substances