Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Oct;54(5):2221-2251.
doi: 10.3758/s13428-021-01711-5. Epub 2022 Jan 14.

An exploration of error-driven learning in simple two-layer networks from a discriminative learning perspective

Affiliations

An exploration of error-driven learning in simple two-layer networks from a discriminative learning perspective

Dorothée B Hoppe et al. Behav Res Methods. 2022 Oct.

Abstract

Error-driven learning algorithms, which iteratively adjust expectations based on prediction error, are the basis for a vast array of computational models in the brain and cognitive sciences that often differ widely in their precise form and application: they range from simple models in psychology and cybernetics to current complex deep learning models dominating discussions in machine learning and artificial intelligence. However, despite the ubiquity of this mechanism, detailed analyses of its basic workings uninfluenced by existing theories or specific research goals are rare in the literature. To address this, we present an exposition of error-driven learning - focusing on its simplest form for clarity - and relate this to the historical development of error-driven learning models in the cognitive sciences. Although historically error-driven models have been thought of as associative, such that learning is thought to combine preexisting elemental representations, our analysis will highlight the discriminative nature of learning in these models and the implications of this for the way how learning is conceptualized. We complement our theoretical introduction to error-driven learning with a practical guide to the application of simple error-driven learning models in which we discuss a number of example simulations, that are also presented in detail in an accompanying tutorial.

Keywords: Cognitive modeling; Computational simulations; Discriminative learning; Error-driven learning; Neural network models.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
A fully connected error-driven learning network, with incoming connections to one outcome highlighted in blue (a). Consider an example of learning to discriminate animals by first seeing an animal, for example a dog, and then hearing it’s species name. (b) shows how the activation of the outcomes dog and other animals develops given the cue set {tail-wagging, a specific fur color}, maximizing certainty to expect one specific outcome. (c) shows a hypothetical weight update after seeing a dog and hearing “dog”. Black dashed lines show positive weight adjustments and red dashed lines negative adjustments. The dashed box shows the current cue set in which weights compete with each other
Fig. 2
Fig. 2
Illustration of the cues and outcome in Kamin (1969) blocking paradigm (a). During randomized training the weight from the more frequent light cue to the outcome food is increased until light completely predicts food by itself (b). This effect is amplified when the light is trained first by itself to predict food (c). While in b) the tone can temporarily increase its weight, it almost can’t increase its weight in c). When the compound cue consisting of light and tone is trained first (d), the weight of the tone cue stays constant (until a new training regimen, e.g. as in b) would be applied)
Fig. 3
Fig. 3
Different examples of cue competition. a) shows how frequency only determines weight differences within sets of cues, as the more frequent loud tone develops the same weight to the outcome food as less frequent soft tone. b) illustrates how frequency effects in cue competition can be canceled out by the structure of cue interactions. Here, every cue interacts with every other cue, which results in all cues having the same weight despite their different frequencies
Fig. 4
Fig. 4
Illustration of outcome competition. In situations with less cues than outcomes (as in a), not all outcomes can be fully predicted. In that case, the updating of absent outcomes as in case 3 of Eq. 4, leads to the learning of conditional probabilities of outcome given a cue. Here, food is twice more likely to occur after the light than water (b). Without this mechanism (for illustration purposes), the single weights will both increase to the activation limit of 1 (c), a result which theoretically violates the aim of maximizing certainty of outcome predictions
Fig. 5
Fig. 5
Illustration of the interaction of cue and outcome competition in dog-rabbit example 1. In this example, the weights learned with full error-driven learning (b) show that species-specific features (e.g., tail-wagging) are more relevant for species discrimination than shared features (i.e., size). When outcome competition is turned off during learning (c), the model does not discover that size is a feature dimension shared between the two species and cue competition leads to the same weights from all features (as in Fig. 3b). When cue competition is turned off during learning (d), weights correspond to the conditional probabilities of the label, here, “dog”, given a feature (small has a lower weight because in some cases it also precedes the label “rabbit”)
Fig. 6
Fig. 6
Learned weights after label-first training mirror conditional probabilities of features given a label (in this case, “dog”). Here, features that are less frequent in dogs (barking and big) receive a lower weight than features that are more frequent in dogs (small and tail-wagging). This differs from weight development in object-first training (Fig. 5), where weights correspond to the relevance of features for discrimination (in that case, size features are less relevant than the other features)
Fig. 7
Fig. 7
Outcome activations after a) object-first and b) label-first training on dog-rabbit example 1 (see Section “?? ??”). When objects precede labels in training (a), dogs (here shown: small, barking dogs), can be discriminated optimally: the activation of the label “dog” given a dog exemplar approaches 1 and the activation of the label “rabbit” approaches 0. However, when labels precede objects (b), optimally discriminative activations cannot be reached: given the label “dog”, dogs with most frequent features (small and tail-wagging) are expected more than dogs with less frequent features (barking and big); crucially, also rabbits are expected to a certain extent after hearing the label “dog”
Fig. 8
Fig. 8
Outcome activations after a) object-first and b) label-first training on dog-rabbit example 2 (see Section “Asymmetry effects”). As opposed to example 1 (Fig. 7), misclassifications occur here after label-first training (b): after hearing a label, e.g. “dog”, low frequency exemplars of the wrong species, here, big rabbits, are expected more than low frequency exemplars of the correct species, here, small dogs. This is due to the particular kind of feature structure, in which one feature of low frequency exemplars in one species (i.e., here big in the species of bunnies) also occurs in high frequency exemplars of the other species (i.e., dogs)
Fig. 9
Fig. 9
Different cue structures to model negative patterning, in which single stimuli predict a different outcome than their combination (a). When the stimulus compound is coded compositionally as a combination of its elements ({Tone, Light}), the two outcomes cannot be discriminated from each other (b). When the stimulus compound is coded by a single configural cue ({LightTone}), discrimination is optimal but not realistic (c). The combination of a configural cue and its elements ({Tone, Light, LightTone}) captures discrimination and generalization (d). See also the interactive interface in the tutorial

Similar articles

Cited by

References

    1. Adi, Y., Kermany, E., Belinkov, Y., Lavi, O., & Goldberg, Y. (2016). Fine-grained analysis of sentence embeddings using auxiliary prediction tasks. arXiv:1608.04207.
    1. Anderson JR. ACT: A simple theory of complex cognition. American Psychologist. 1996;51(4):355–365. doi: 10.1037/0003-066X.51.4.355. - DOI
    1. Anderson JR. Human symbol manipulation within an integrated cognitive architecture. Cognitive Science. 2005;29(34):313–341. doi: 10.1207/s15516709cog0000_22. - DOI - PubMed
    1. Arnold D, Tomaschek F, Sering K, Lopez F, Baayen RH. Words from spontaneous conversational speech can be recognized with human-like accuracy by an error-driven learning algorithm that discriminates between meanings straight from smart acoustic features, bypassing the phoneme as recognition unit. PloS one. 2017;12(4):e0174623. doi: 10.1371/journal.pone.0174623. - DOI - PMC - PubMed
    1. Arnon I, Ramscar M. Granularity and the acquisition of grammatical gender: How order-of-acquisition affects what gets learned. Cognition. 2012;122(3):292–305. doi: 10.1016/j.cognition.2011.10.009. - DOI - PubMed

Publication types

LinkOut - more resources