Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Nov 19;11(11):e1004592.
doi: 10.1371/journal.pcbi.1004592. eCollection 2015 Nov.

Learning of Chunking Sequences in Cognition and Behavior

Affiliations

Learning of Chunking Sequences in Cognition and Behavior

Jordi Fonollosa et al. PLoS Comput Biol. .

Abstract

We often learn and recall long sequences in smaller segments, such as a phone number 858 534 22 30 memorized as four segments. Behavioral experiments suggest that humans and some animals employ this strategy of breaking down cognitive or behavioral sequences into chunks in a wide variety of tasks, but the dynamical principles of how this is achieved remains unknown. Here, we study the temporal dynamics of chunking for learning cognitive sequences in a chunking representation using a dynamical model of competing modes arranged to evoke hierarchical Winnerless Competition (WLC) dynamics. Sequential memory is represented as trajectories along a chain of metastable fixed points at each level of the hierarchy, and bistable Hebbian dynamics enables the learning of such trajectories in an unsupervised fashion. Using computer simulations, we demonstrate the learning of a chunking representation of sequences and their robust recall. During learning, the dynamics associates a set of modes to each information-carrying item in the sequence and encodes their relative order. During recall, hierarchical WLC guarantees the robustness of the sequence order when the sequence is not too long. The resulting patterns of activities share several features observed in behavioral experiments, such as the pauses between boundaries of chunks, their size and their duration. Failures in learning chunking sequences provide new insights into the dynamical causes of neurological disorders such as Parkinson's disease and Schizophrenia.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Two-layer network for learning chunking dynamics.
In this example, the input sequence (a, b, c, d, e) is presented repeatedly. Initially, all the synaptic connections within a matrix are similar with small random variations. Through learning distinct elementary modes associate to each of the five patterns through weights of the projection matrix P ki. In the elementary layer, the weights V ii in the directions a to b, b to c, and d to e are weakened (arrow thickness denotes coupling strength), while the weights in the opposite direction are strengthened. The W jj follow a similar learning rule to three chunks: ab, c and de. Chunking, i.e. the information specifying the association between CM and EM, is learned in the coupling matrices Q ij and R ji. The input in the perceptual layer is represented as non-overlapping binary patterns. For example, element a is the binary pattern s a = [11000100], input b is the binary pattern s b = [00100010], etc. Black circles represent inhibitory couplings, while arrowheads represent excitatory couplings. The number of elementary modes should be larger or equal to the number of patterns in a sequence. Note that there must be at least three units in each layer for a stable heteroclinic cycle to exist. It is not necessary that N y < N x, and any value such that N y > 3, N x > 3 can be used. i = 1, …, N X; j = 1, …, N Y; k = 1…, M; N XM > 3.
Fig 2
Fig 2. Projection of the phase portrait of the two-layer chunking hierarchical dynamics in the space of three auxiliary variables.
This example illustrates the dynamics of a system N X = 24, N Y = 3 before (left) and after learning (right) a sequence consisting of 24 patterns of M = 144 pixels. For visualization purposes, the variable space was projected according to Ji=.5yi+.5(x1i+x2i+x3i), where superscript refers to the associated chunk. The plot is colored red when either of the chunks are active (y i > .9, ∀i). The traces were obtained from 12 runs starting from random initial conditions in the vicinity of the origin of the transformed space. Before learning, the network reaches stable fixed points. After learning, the network results in a closed chunking sequence (black) that consists of several heteroclinic cycles that represent the chunks (red). Each of the three chunks consist of EM, as the system visits the eight states in each chunk. Note however that the projection used here effectively reduces these to 9 (three states per chunk) for visualization purposes.
Fig 3
Fig 3. Input and network activities during learning and recall.
s k, x i, y j, z k during learning (after 5 presentations) (a) and during sequence recall (after 120 presentations) (b). Within each layer, different colors represent different modes (variables). The sensory input (presented only during learning) consisted of 24 different patterns presented sequentially. The patterns were composed of 144 binary (represented in black and white) pixels. During learning, the input drives the system dynamics. During recall, the elementary modes and the chunking modes activate in the same order as in learning. Each CM represents about 8 consecutively active elementary modes. The onset of each chunk is delayed and caused by the inhibition from the chunking layer. It is consistent with pauses before loading chunks observed in behavioral studies (highlighted in dashed line). (c) Duration that each EM remains active, with the same color codings as in (b). Three modes associated to the transitions between chunks remain active for a longer time than the others. Such pauses can be identified with pauses observed in behavioral experiments involving chunking [17].
Fig 4
Fig 4. Synaptic weights before and after learning.
(a, b) Initially (t ini), the recurrent weight matrices implement all-to-all symmetric inhibition, leading to WTA. After learning t fin the matrices acquire an asymmetric component, leading to WLC. Superimposed white arrows in (b) indicate the resulting order of the recalled states. (c, d) The weights in the matrices Q ij and R ji learn which EM belongs to which chunk. The last three columns correspond to the elements that activate during chunk transitions.
Fig 5
Fig 5. Input weights P ki at the elementary modes.
(left) before and (right) after training. At the beginning, t ini, the weights are random. The learning associates each of the 24 patterns to one EM.
Fig 6
Fig 6. The dynamics of chunking.
The model is run 60 times, for 120 trials (N y = 30) for different levels of noise. Each trial consisted of the presentation of one sequence, followed by a recall phase. (Top-Left) Sequence recall accuracy D averaged over all the runs. The sequence was determined by the identity of the most active mode in the elementary layer.D was computed using the Levenshtein distance (equal to the number of additions and subtractions between two sequences). In the noiseless and low noise cases, the distance between the presented sequence and the reproduced sequence reached about.05 (horizontal line), roughly corresponding to 1 addition/subtraction per sequence recall. The network was robust to noise, and sequence recall accuracy degraded gracefully as the amplitude of noise was increased. (Bottom-Left) Estimates of chunking rate measure CR for monitoring chunking in the noiseless case (blue curves).CR is defined as the number of transitions taking place in the chunking layer during the presentation of a pattern in the sequence. During an initial transient CR decreases as learning proceeds, indicating the formation of the chunks. (Right) Activity in the chunking layer for two representative runs, one with no noise, the other with no chunks, where learning of Q ij and R ji was turned off. The identity of the chunks is color-coded. Interestingly, the boundaries of the chunks can change during training, and the chunks can undergo substantial reconfigurations at the beginning of the training phase. In absence of learning in Q ij and R ji, the chunking rate did not diminish over the course of learning, indicating the absence of chunks. S4 Fig displays the evolution of the individual weights for the run shown in the top-right panel (No Noise).
Fig 7
Fig 7. Chunk size, number of EM in each chunk, (left) as a function of the potentiation scaling factor in Q, γpQ, (right) as a function of the time constant in the synaptic dynamics, τ z.
The number of information-carrying items contained in the chunks depends on the system dynamics, suggesting that they have impact on the total capacity of the memory. The initial random conditions lead the system to different structures after learning (number and size of chunks). The case τ z = 0 corresponds to completely removing the synaptic dynamics. Although the chunking is present in the absence of z j, the characteristic time scale of z j, τ z has a powerful effect on chunk size. Each point was evaluated 100 times and the mean and standard deviation are presented, suggesting a monotonically increasing relationship between chunk size and γpQ or τ z. In total, 98.6% of the runs exhibited sequential activity in the chunking layer. Total number of available chunk modes, N Y = 30; total number of elementary modes, N X = 30.
Fig 8
Fig 8
(A) Stable heteroclinic chain with two connected metastable states (B) Stable heteroclinic channel (SHC)—robust sequence of metastable states. Adapted from [82]. (C) Transformation of the phase volume along trajectories in the neighborhood of unstable separatrix in the case when both coupled saddles are characterized by saddle values larger than one.

Similar articles

Cited by

References

    1. Ericcson K, Chase WG, Faloon S. Acquisition of a memory skill. Science. 1980;208(4448):1181–1182. 10.1126/science.7375930 - DOI - PubMed
    1. Bousfield WA. The occurrence of clustering in the recall of randomly arranged associates. The Journal of General Psychology. 1953;49(2):229–240. 10.1080/00221309.1953.9710088 - DOI
    1. Miller GA. The magical number seven, plus or minus two: some limits on our capacity for processing information. Psychological review. 1956;63(2):81 10.1037/h0043158 - DOI - PubMed
    1. Gobet F, Lane PC, Croker S, Cheng PC, Jones G, Oliver I, et al. Chunking mechanisms in human learning. Trends in cognitive sciences. 2001;5(6):236–243. 10.1016/S1364-6613(00)01662-4 - DOI - PubMed
    1. Verwey WB. Concatenating familiar movement sequences: the versatile cognitive processor. Acta psychologica. 2001;106(1):69–95. 10.1016/S0001-6918(00)00027-5 - DOI - PubMed

Publication types