Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2011 Feb;14(2):154-62.
doi: 10.1038/nn.2723.

From reinforcement learning models to psychiatric and neurological disorders

Affiliations
Review

From reinforcement learning models to psychiatric and neurological disorders

Tiago V Maia et al. Nat Neurosci. 2011 Feb.

Abstract

Over the last decade and a half, reinforcement learning models have fostered an increasingly sophisticated understanding of the functions of dopamine and cortico-basal ganglia-thalamo-cortical (CBGTC) circuits. More recently, these models, and the insights that they afford, have started to be used to understand important aspects of several psychiatric and neurological disorders that involve disturbances of the dopaminergic system and CBGTC circuits. We review this approach and its existing and potential applications to Parkinson's disease, Tourette's syndrome, attention-deficit/hyperactivity disorder, addiction, schizophrenia and preclinical animal models used to screen new antipsychotic drugs. The approach's proven explanatory and predictive power bodes well for the continued growth of computational psychiatry and computational neurology.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Principles of computational psychiatry and computational neurology, (a) The starting point in computational psychiatry and computational neurology is a model of normal function that captures key aspects of behavior and/or neural activity. Models at various levels of abstraction can be useful (e.g., algorithmic models from machine learning or neural models from computational cognitive neuroscience). Several complementary approaches can then be pursued, (b) With detailed neural models, pathophysiological processes can be simulated by making principled changes to the model that mimic biological alterations in the disorder under consideration (e.g., alterations in striatal dopaminergic innervation or D2 receptor density). The systems-level and behavioral implications of these changes can then be explored using the model, leading to testable predictions. We call this approach ‘deductive,’ because the models are used to recreate the mechanistic link between causes (the biological abnormalities) and their consequences (abnormalities in behavior and/or systems-level neural activity). This approach can elucidate whether the observed biological abnormalities are sufficient to explain a behavioral phenotype. (c) A second approach involves using a model to try to infer the causes of the observed behavioral phenotype and/or of the observed alterations in neural activity. We call this approach ‘abductive,’ because it involves reasoning from consequences (the behavior or systems-level neural activity) to their possible causes (the underlying biological abnormalities). In this approach, alternative a priori hypotheses concerning possible biological abnormalities in a given disorder can be compared to determine which, if any, produce the same set of abnormalities in behavior and/or neural activity that is found in the disorder (Maia and Peterson, submitted manuscript), (d) A third approach, used more often with algorithmic than with neural models (largely because the former tend to have fewer parameters), involves fitting the model’s parameters to the behavior of individual subjects on a suitable task or set of tasks, and then determining if there are parameter differences between diseased and healthy subject groups, or correlations between parameters and disease severity. We call this approach ‘quantitative abductive,’ because it also involves reasoning from behavior to its mechanistic causes. A fourth, related approach (not shown graphically) also involves fitting a model’s parameters to subjects’ behavior, but the goal is to then calculate, on a trial-by-trial basis, each subject’s putative internal representation of the quantities calculated by the model (e.g., state values or prediction errors). These predicted internal representations are then used as regressors in functional imaging (e.g., fMRI, EEG), to find their neural correlates, which are then compared across the diseased and healthy groups. Each of these four approaches can also be adapted to study the effects of treatments (e.g., medication or neurosurgery). Furthermore, additional leverage can sometimes be gained by the synergistic use of different approaches and/or models at different levels of abstraction. For example, the deductive or abductive approaches are especially powerful with neural models, because these models embody mechanistic details that permit direct simulation of biological abnormalities. Such models, however, sometimes include too many parameters to make a quantitative abductive approach feasible. In some cases, a useful strategy is to construct an algorithmic model that includes parameters that reflect distinct mechanisms in the neural model (e.g., a Q-learning model with different learning rates for positive and negative prediction errors has been used to capture the prediction from a neural model of the basal ganglia that these two types of learning rely on distinct mechanisms). The neural model’s deductive predictions concerning how a disorder affects these parameters can then be verified using the quantitative abductive approach with the algorithmic model.
Figure 2
Figure 2
(a) Anatomy of cortico-basal ganglia-thalamo-cortical loops. Striatal medium spiny neurons (MSNs) in the direct pathway (‘Go’ neurons) express mostly D1 receptors and project directly to the globus pallidus internal segment (GPi) and the substantia nigra pars reticulata (SNr). [The GPi and the SNr have similar functions, so we treat them as a single complex (GPi/SNr).] Go neurons inhibit the GPi/SNr, which in turn results in disinhibition of the thalamus, thereby facilitating execution of the corresponding action. Striatal MSNs in the indirect pathway (‘NoGo’ neurons) express mostly D2 receptors and project to the globus pallidus external segment (GPe), which in turn projects to the GPi/SNr. NoGo neurons produce a focused removal of the tonic inhibition of the GPe on the GPi/SNr, thereby disinhibiting the GPi/SNr, which in turn results in suppression of the corresponding action in the thalamus. Neurons in the subthalamic nucleus (STN) receive direct projections from the cortex in the hyperdirect pathway and project to both the GPe and GPi/SNr. The projections from the STN to the GPe and GPi/SNr are diffuse, so they are believed to modulate all actions rather than a specific action, (b) The basal ganglia Go/NoGo model'. The synaptic connections in the model are consistent with the anatomical connections shown in (a). The model learns to map inputs, which represent the current stimuli and/or situation (i.e., the current state), to actions represented in the pre-supplementary motor area (preSMA). Corticocortical projections from the input layer to preSMA activate in preSMA candidate actions that are appropriate for the current state. The basal ganglia then act to facilitate (‘gate’) the best action - i.e., the action with the best reinforcement history for the current state - while simultaneously suppressing the other actions (at the level of the thalamus). Distributed populations of Go and NoGo units represent, respectively, the positive and negative evidence for the candidate actions in the current state. Lateral inhibition between the Go and NoGo pathways ensures that the probability of selecting a given action is a function of the difference between the positive and negative evidence for that action. The positive and negative evidence for each action in each state is learned on the basis of past reinforcement history, through the actions of dopamine on D1 and D2 receptors in striatal Go and NoGo units, respectively (see text). The synaptic weights in the corticocortical projections from the input layer to preSMA are themselves learned, but through Hebbian mechanisms, thereby allowing these corticocortical projections to activate candidate actions in preSMA in proportion to their prior probability of being executed in the given state. Thus, the candidate actions generated by these corticocortical projections for a given state tend to be those that have previously often been gated by the basal ganglia in that state. When two or more actions become strongly activated in preSMA because they have similar reinforcement histories, this response conflict activates the STN via the hyperdirect pathway (consistent with the evidence for direct anatomical connections between the preSMA and the STN, and their co-activation in high-conflict situations). The STN then provides a ‘Global NoGo’ signal that prevents premature facilitation of suboptimal responses.
Figure 3
Figure 3
The probabilistic selection task. The probabilistic selection task is used to assess whether participants learn better from positive or negative outcomes. During training, in each trial participants are presented with one of the three pairs shown on top (AB, CD, and EF), and select one of the two stimuli. Feedback then indicates if the choice was correct or incorrect. The probabilities of each stimulus leading to correct feedback are indicated in the figure. Participants may learn to perform accurately during training (i.e., learn to select A, C, and E) by learning which stimulus in each pair is associated with positive feedback (Go learning), by learning which stimulus in each pair is associated with negative feedback (NoGo learning), or both. The test phase assesses the degree to which participants learned better from positive or from negative feedback. Participants are presented with novel pairs of stimuli consisting of either an A or a B paired with each of the other stimuli (C through F, which on average had 50% probability of positive feedback during training). No feedback is provided during testing. If participants perform better on the pairs that include A than on those that include B, that indicates that they learned better to select the most positive stimulus (A) than to avoid the most negative stimulus (B), so they learn better from positive feedback (Go learning). If they perform better on the pairs that include B, they learn better from negative feedback (NoGo learning). Indeed, individual differences in neural responses to negative outcomes predict individual differences in performance on the pairs that include B (but not on those that include A).

References

    1. Hyman SE. Can neuroscience be integrated into the DSM-V? Nat. Rev. Neurosci. 2007;8:725–732. - PubMed
    1. Kendell R, Jablensky A. Distinguishing between the validity and utility of psychiatric diagnoses. Am. J. Psychiatry. 2003;160:4–12. - PubMed
    1. Charney DS, et al. Neuroscience research agenda to guide development of a pathophysiologically based classification system. In: Kupfer DJ, First MB, Regier DA, editors. A Research Agenda for DSM-V. American Psychiatric Association; Washington DC: 2002. pp. 31–83.
    1. Cools R. Dopaminergic modulation of cognitive function-implications for L-DOPA treatment in Parkinson’s disease. Neurosci. Biobehav. Rev. 2006;30:1–23. - PubMed
    1. Frank MJ, Samanta J, Moustafa AA, Sherman SJ. Hold your horses: impulsivity, deep brain stimulation, and medication in parkinsonism. Science. 2007;318:1309–1312. - PubMed

Publication types