Extended Data Fig. 8. Hidden Markov learning in Clone-Structured Causal Graph recapitulates animal’s learning process.
(a) The transition graph of CSCG during different learning stages recapitulates the low-dimensional neural manifolds observed in animals during learning. (b) Matrix depicting the correlation of probabilities over clones averaged for different regions: off-diagonal gray regions (gray), pre-R2 region (light blue), pre-R1 region (dark blue), Initial region (red), Indicator region (orange), End region (cyan) shown for all individual simulations that fully learned, for an example simulation, and average across all simulations (Curves represent the mean values, with shading indicating ± s.e.m). Comparing over time, a significant difference was observed between the pre-R1 and pre-R2 regions (two-sided Wilcoxon signed-rank test, P < 0.0001****, n = 900 datapoints compared from 18 simulations). Comparisons between beginning and end of training revealed a significant decrease in correlation for off-diagonal gray regions, pre-R2, and pre-R1 (two-sided Wilcoxon signed-rank test, P < 0.0001****, n = 18 simulations). (c) Schematic representation of different possible sensory symbol sequences mimicking the animal’s experience, including different orders of visual and reward experiences, and a separate reward or a combined code for reward and visual. (d) Time taken for the correlation between vectors of probability over clones of pre-R1 (dark blue) and pre-R2 (light blue) between the near and far trial types to drop below 0.3. Boxplot showing the median and quartiles of the dataset, and whiskers showing 1.5 times the interquartile range. For a visual symbol followed by the same reward, the time taken to decorrelate pre-R1 significantly exceeds the time taken to decorrelate pre-R2 (n = 15 simulations, two-sided paired Student’s t-test, P < 0.01**). In contrast, for other sequences, the time taken to decorrelate pre-R1 is either not significantly different from (visual then different reward, n = 20 simulations) or significantly less than the time taken to decorrelate pre-R2 (P < 0.01**, same reward then visual, n = 20, P < 0.0001****, different reward then visual, n = 19). Simulations that did not fully decorrelate both pre-R1 and pre-R2 were excluded. (e-f) Conceptual illustration of task and CSCG. (e) The world state, determined by the position and trial type, is not directly accessible to the model. Instead, the system can access sensory experiences generated based on the world state, which is used to learn a world model that accurately predicts the next sensory experience. (f) Schematic of the CSCG and the learned transition sequence. Each sensory stimulus is associated with a set of clones or hidden states. The system learns transition probabilities between these clones to generate a world model. Gray sensory stimuli are observed at distinct locations on the near and far trials, so different gray clones learn to represent these distinct locations. For less ambiguous stimuli, such as the indicator, most clones remain unused. (g-i) Toy examples illustrating orthogonalization in CSCG. (g) An example “world” comprising two sequences of observations: ‘A, G, B’ and ‘C, G, D,’ where observation G is common to both. The CSCG architecture considered includes a clone for each observation (A1, B1, etc.), except for G, which has two clones (G1 and G2). Transitions that cannot produce valid sensory sequences have been removed, leaving only the feasible transitions (gray arrows). Two model CSCGs with different transition probabilities (indicated by arrow width and numerical values) are shown. In model 1, both trials utilize both G1 and G2 clones, resulting in correlated state probabilities for G across the two trials. When the first observation is A, the sequence ‘A, G, B’ can be generated through two latent state sequences: A1 → G1 → B1 and A1 → G2 → B1 (black arrows), each with a probability of 0.25, leading to an overall probability of 0.5. This lower probability arises because this model could also produce unobserved sequences like ‘A, G, D’. In model 2, when the first observation is A, the sequence ‘A, G, B’ is generated by a single latent sequence: A1 → G1 → B1 with a probability of 1. The alternative sequence ‘A, G, D’ has a probability of 0. This transition matrix maximizes the likelihood of observed sequences in the toy world by utilizing G1 and G2 clones separately for each trial, thereby orthogonalizing the representation of G across the two trials. (h) Illustration of an HMM with a different architecture with 3 latent clones for observation ‘G’. The transition matrix depicted uses multiple clones ‘G1’ and ‘G2’ for the 1st trial, yet it maximizes the observation sequence by utilizing distinct clones across the two trials (‘G1, G2’ vs ‘G3’). This suggests that representations must be orthogonal, but not necessarily highly sparse. (i) A different example “world” consisting of two sequences of observations: ‘A, G, B’ and ‘C, G, B,’ where the observation G appears after distinct cues (‘A’ vs. ‘C’) but is followed by the same cue (‘B’). Illustration of a particular transition matrix, where both trials utilize G1 and G2 clones. If the first observation is A, the sequence ‘A, G, B’ can be generated through two latent state sequences: A1 → G1 → B1 and A1 → G2 → B1 (black arrows), each with a probability of 0.5, which results in a combined probability of 1, despite correlated representations of G across the two trials. Since G is followed by the same observation (‘B’), it is possible to maximize the probability of observation sequence without needing to decorrelate the representation of G. This helps explain why the end of the track remains correlated across near and far trials in many animals.