Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Sep 16;15(9):e1007331.
doi: 10.1371/journal.pcbi.1007331. eCollection 2019 Sep.

A flexible and generalizable model of online latent-state learning

Affiliations

A flexible and generalizable model of online latent-state learning

Amy L Cochran et al. PLoS Comput Biol. .

Abstract

Many models of classical conditioning fail to describe important phenomena, notably the rapid return of fear after extinction. To address this shortfall, evidence converged on the idea that learning agents rely on latent-state inferences, i.e. an ability to index disparate associations from cues to rewards (or penalties) and infer which index (i.e. latent state) is presently active. Our goal was to develop a model of latent-state inferences that uses latent states to predict rewards from cues efficiently and that can describe behavior in a diverse set of experiments. The resulting model combines a Rescorla-Wagner rule, for which updates to associations are proportional to prediction error, with an approximate Bayesian rule, for which beliefs in latent states are proportional to prior beliefs and an approximate likelihood based on current associations. In simulation, we demonstrate the model's ability to reproduce learning effects both famously explained and not explained by the Rescorla-Wagner model, including rapid return of fear after extinction, the Hall-Pearce effect, partial reinforcement extinction effect, backwards blocking, and memory modification. Lastly, we derive our model as an online algorithm to maximum likelihood estimation, demonstrating it is an efficient approach to outcome prediction. Establishing such a framework is a key step towards quantifying normative and pathological ranges of latent-state inferences in various contexts.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1
A) A learning agent’s world view whereby rewards are generated according to cues, a latent state, and a latent error. In order to predict rewards, they must infer which latent state is active, the relationship between cues and rewards for each latent state, and the expected uncertainty in rewards due to the latent error. B) The proposed model for how a learning agent inverts their world view. They first observe cues to generate expectations or predictions for rewards based on L estimates of associative strengths corresponding to L latent states. Upon observing rewards, they use errors in their predictions to update associative strengths, measures of uncertainty, and beliefs in which state is active. The degree to which associative strengths can be updated depends on both the agent’s belief in the corresponding latent state and the corresponding effort matrix, which keeps track of how cues covary.
Fig 2
Fig 2. Simulated associative strengths of cues during blocking, overexpectation, and conditioned inhibition experiments, which the Rescorla-Wagner (RW) model famously could explain.
Built upon the RW model, our latent-state model and the Gershman (2017) model can also explain these experiments. Gray dashed lines demarcate experimental stages.
Fig 3
Fig 3. Associability depends on latent-state beliefs.
A) Experimental results from Stage 3 of the Wilson et al (1992) experiment [27]. Group E had higher magazine activity, i.e. greater responding, than Group C during the light (Cue A) in Stage 3. Thus, it was believed the light had greater associative strength in Group E than Group C during Stage 3. Reprinted from Paul N. Wilson, Patrick Boumphrey, & John M. Pearce, Quarterly Journal of Experimental Psychology 44:1 pp. 17-36. Reprinted by Permission of SAGE Publications, Ltd. B) Simulation of associative strength of the light (Cue A) and latent state beliefs from the Wilson et al (1992) experiment. Our model predicts that only Group E detects the change in experimental conditions and shifts their beliefs. Because of this shift, associability is higher in Group E than C during Stage 3, leading to higher associative strength of the light. Beliefs in the first latent state (dark and light blue solid lines) and the second latent state (dark and light blue dashed lines) are shown for models with latent states. Gray dashed lines demarcate experimental stages.
Fig 4
Fig 4. Another demonstration of associability depending on beliefs in latent states.
Experimental results from A) Experiment 1A and B) Experiment 1B by Rescorla (2000) [39]. In both experiments, Rescorla concluded that associative strength increased more for Cue B relative to Cue A when presented together based on compound tests that showed greater responding in the compound with the Cue B than the compound with Cue A. This result suggested that associability can differ between cues even when presented together. Reprinted from “Associative Changes in Excitors and Inhibitors Differ When They Are Conditioned in Compound” by R.A. Rescorla, 2000, Journal of Experimental Psychology, 26, p. 430-431. Reprinted with permission from the American Psychological Association. C) Total change in associative strength was simulated during Stage 2 of Experiments 1A-B from Rescorla (2000) [39]. The RW model does not capture these effects since associability is constant whereas our model captures these effects because latent-state beliefs alters associability. Beliefs in the first latent state (black solid lines) and the second latent state (black dashed lines) are shown for models with latent states. Gray dashed lines demarcate experimental stages.
Fig 5
Fig 5. Partial reinforcement extinction effect.
A) Experimental results from Jenkins [41] demonstrating that the associative strength of a cue is harder to extinguish after partial reinforcement (Group 20P) compared to continuous reinforcement (Group 20R). Reprinted from “Resistance to extinction when partial reinforcement is followed by regular reinforcement” by H.M. Jenkins, 1962, Journal of Experimental Psychology, 64, p. 443. Reprinted with permission from the American Psychological Association. B) Simulation of partial reinforcement effect (Experiment 1). This effect is observed even when partial reinforcement is followed by continuous reinforcement prior to extinction (Experiment 2). Our model captures these effects, because an agent is better able to discriminate between reinforcement and extinction with a continuous reinforcement schedule. The agent can thus shift their beliefs to a new latent state in order to build new associations for extinction. Beliefs in the first latent state are shown for models with latent states. Gray dashed lines demarcate experimental stages.
Fig 6
Fig 6. Associability depends on history of cue presentation.
A) Experimental results of a backwards blocking experiment from Miller and Matute [43]. After two stages of the experiment, a backwards blocking group (BB) had significantly slower response time to Cue X than a control group (CON) even though Cue X was presented in an identical manner between groups. Their result suggested the associative strength of Cue X can change on trials it is not presented. Reprinted from “Biological Significance in Forward and Backward Blocking Discrepancy Between Animal Conditioning and Human Causal Judgement” by R.R. Miller and H. Matute, 1996, Journal of Experimental Psychology, 125, p. 374. Reprinted with permission from the American Psychological Association. B) Simulation of a backwards blocking experiment. In our model, associability depends on the history of cue presentation through effort matrices. After the combined associative strength of Cue A and Cue X is learned, these effort matrices rotate the direction of learning into the direction of the difference of Cue A and Cue X. As a result, associative strength of Cue X decreases even though Cue A is presented alone, thereby allowing our model to capture backwards blocking. Gray dashed lines demarcate experimental stages.
Fig 7
Fig 7. Changes in context influences beliefs.
A) Experimental results from Ricker and Bouton [46] demonstrating a faster response during reacquistion (phase 3) than acquisition (phase 1) after extinction (phase 2). Reprinted by permission from Springer Nature Customer Service Centre GmbH: Springer Nature, Animal learning and behavior, Reacquisition following extinction in appetitive conditioning, Sean T. Ricker & Mark E. Bouton, (1996). B) Experimental results from Bouton and King [47] demonstrating a more robust return of a fear when extinction occurs in different context (EXT-B) as opposed to the same context (EXT-A) as acquisition. Reprinted from “Contextual Control of the Extinction of Conditioned Fear: Tests for the Associative Value of the Context” by M.E. Bouton and D.A. King, 1983, Journal of Experimental Psychology: Animal Behavior Processes, 9, p. 252. Reprinted with permission from the American Psychological Association. C) Simulation results of expected rewards when a response is reinstated after extinction with and without a change in a visual/spatial context. The second experiment examines associative strength of cue when a response is reinstated after extinction with and without a change in a temporal context (i.e. a time delay between trials). Our model shows a rapid reinstatement of expectations as the agent switch their beliefs back to the first latent state. Our model also shows that rapid reinstatement is more robust with changes in context, particular in the first few trials of reinstatement. Expected rewards are depicted rather than associative strengths to account for the influence of context on expectations in addition to the cue, since models other than our model treat context as an additional cue. Beliefs are shown for the first latent state. Gray dashed lines demarcate experimental stages.
Fig 8
Fig 8. Changes in temporal context also influences beliefs.
A) Experimental results from Brooks and King [48] demonstrating a more robust return of a fear when extinction occurs after a 6 day delay as opposed to immediately after acquisition. Reprinted from “A Retrieval Cue for Extinction Attenuates Spontaneous Recovery” by D.C. Brooks and M.E. Bouton, 1993, Journal of Experimental Psychology: Animal Behavior Processes, 19, p. 80. Reprinted with permission from the American Psychological Association. B) Simulation results of associative strength and beliefs when a response is reinstated after extinction with and without a change in a temporal context (i.e. a time delay between trials). Our model shows rapid reinstatement is more robust with changes in temporal context, particular in the first few trials of reinstatement. Beliefs are shown for the first latent state. Gray dashed lines demarcate experimental stages.
Fig 9
Fig 9. Memory modification.
A) Experimental results from Schiller et al [45] demonstrating that different delays (10 min, 6 hr, no reminder) of a retrieval trial can significantly modify fear response after extinction. Reprinted by permission from Springer Nature Customer Service Centre GmbH: Springer Nature, Nature Preventing the return of fear in humans using reconsolidation update mechanisms, Schiller et al., (2010). B-C) Simulation results of associative strength and beliefs for three different time delays (1, 5, and 100) between a single retrieval and extinction. Both models correctly predict that certain time delays can weaken the associative strength upon testing. The bottom row depicts a ‘reconsolidation window’ of time delays after retrieval wherein the associative strength on testing is decreased.

References

    1. Huys QJ, Guitart-Masip M, Dolan RJ, Dayan P. Decision-theoretic psychiatry. Clinical Psychological Science. 2015;3(3):400–421. 10.1177/2167702614562040 - DOI
    1. Sutton RS, Barto AG, et al. Reinforcement learning: An introduction. MIT press; 1998.
    1. Daw ND, O’doherty JP, Dayan P, Seymour B, Dolan RJ. Cortical substrates for exploratory decisions in humans. Nature. 2006;441(7095):876 10.1038/nature04766 - DOI - PMC - PubMed
    1. Doll BB, Simon DA, Daw ND. The ubiquity of model-based reinforcement learning. Current opinion in neurobiology. 2012;22(6):1075–1081. 10.1016/j.conb.2012.08.003 - DOI - PMC - PubMed
    1. Wilson R, Collins A. Ten simple rules for the computational modeling of behavioral data. 2019;. - PMC - PubMed

Publication types