. 2022 Oct;54(5):2221-2251.

doi: 10.3758/s13428-021-01711-5. Epub 2022 Jan 14.

An exploration of error-driven learning in simple two-layer networks from a discriminative learning perspective

Dorothée B Hoppe¹, Petra Hendriks², Michael Ramscar³, Jacolien van Rij⁴

Affiliations

¹ Center for Language and Cognition, University of Groningen, Groningen, The Netherlands. d.b.hoppe@rug.nl.
² Center for Language and Cognition, University of Groningen, Groningen, The Netherlands.
³ Department of Linguistics, University of Tübingen, Tübingen, Germany.
⁴ Department of Artificial Intelligence, University of Groningen, Groningen, The Netherlands.

PMID: 35032022
PMCID: PMC9579095
DOI: 10.3758/s13428-021-01711-5

An exploration of error-driven learning in simple two-layer networks from a discriminative learning perspective

Dorothée B Hoppe et al. Behav Res Methods. 2022 Oct.

. 2022 Oct;54(5):2221-2251.

doi: 10.3758/s13428-021-01711-5. Epub 2022 Jan 14.

Authors

Dorothée B Hoppe¹, Petra Hendriks², Michael Ramscar³, Jacolien van Rij⁴

Affiliations

¹ Center for Language and Cognition, University of Groningen, Groningen, The Netherlands. d.b.hoppe@rug.nl.
² Center for Language and Cognition, University of Groningen, Groningen, The Netherlands.
³ Department of Linguistics, University of Tübingen, Tübingen, Germany.
⁴ Department of Artificial Intelligence, University of Groningen, Groningen, The Netherlands.

PMID: 35032022
PMCID: PMC9579095
DOI: 10.3758/s13428-021-01711-5

Abstract

Error-driven learning algorithms, which iteratively adjust expectations based on prediction error, are the basis for a vast array of computational models in the brain and cognitive sciences that often differ widely in their precise form and application: they range from simple models in psychology and cybernetics to current complex deep learning models dominating discussions in machine learning and artificial intelligence. However, despite the ubiquity of this mechanism, detailed analyses of its basic workings uninfluenced by existing theories or specific research goals are rare in the literature. To address this, we present an exposition of error-driven learning - focusing on its simplest form for clarity - and relate this to the historical development of error-driven learning models in the cognitive sciences. Although historically error-driven models have been thought of as associative, such that learning is thought to combine preexisting elemental representations, our analysis will highlight the discriminative nature of learning in these models and the implications of this for the way how learning is conceptualized. We complement our theoretical introduction to error-driven learning with a practical guide to the application of simple error-driven learning models in which we discuss a number of example simulations, that are also presented in detail in an accompanying tutorial.

Keywords: Cognitive modeling; Computational simulations; Discriminative learning; Error-driven learning; Neural network models.

PubMed Disclaimer

Figures

**Fig. 1**
A fully connected error-driven learning network, with incoming connections to one outcome highlighted in blue (a). Consider an example of learning to discriminate animals by first seeing an animal, for example a dog, and then hearing it’s species name. (b) shows how the activation of the outcomes *dog* and other animals develops given the cue set *{tail-wagging, a specific fur color}*, maximizing certainty to expect one specific outcome. (c) shows a hypothetical weight update after seeing a dog and hearing “dog”. Black dashed lines show positive weight adjustments and red dashed lines negative adjustments. The dashed box shows the current cue set in which weights compete with each other

**Fig. 2**
Illustration of the cues and outcome in Kamin (1969) blocking paradigm (a). During randomized training the weight from the more frequent light cue to the outcome *food* is increased until light completely predicts food by itself (b). This effect is amplified when the light is trained first by itself to predict food (c). While in b) the tone can temporarily increase its weight, it almost can’t increase its weight in c). When the compound cue consisting of light and tone is trained first (d), the weight of the tone cue stays constant (until a new training regimen, e.g. as in b) would be applied)

**Fig. 3**
Different examples of cue competition. a) shows how frequency only determines weight differences *within* sets of cues, as the more frequent loud tone develops the same weight to the outcome *food* as less frequent soft tone. b) illustrates how frequency effects in cue competition can be canceled out by the structure of cue interactions. Here, every cue interacts with every other cue, which results in all cues having the same weight despite their different frequencies

**Fig. 4**
Illustration of outcome competition. In situations with less cues than outcomes (as in a), not all outcomes can be fully predicted. In that case, the updating of absent outcomes as in case 3 of Eq. 4, leads to the learning of conditional probabilities of outcome given a cue. Here, food is twice more likely to occur after the light than water (b). Without this mechanism (for illustration purposes), the single weights will both increase to the activation limit of 1 (c), a result which theoretically violates the aim of maximizing certainty of outcome predictions

**Fig. 5**
Illustration of the interaction of cue and outcome competition in dog-rabbit example 1. In this example, the weights learned with full error-driven learning (b) show that species-specific features (e.g., *tail-wagging*) are more relevant for species discrimination than shared features (i.e., size). When outcome competition is turned off during learning (c), the model does not discover that size is a feature dimension shared between the two species and cue competition leads to the same weights from all features (as in Fig. 3b). When cue competition is turned off during learning (d), weights correspond to the conditional probabilities of the label, here, “dog”, given a feature (small has a lower weight because in some cases it also precedes the label “rabbit”)

**Fig. 6**
Learned weights after label-first training mirror conditional probabilities of features given a label (in this case, “dog”). Here, features that are less frequent in dogs (*barking* and *big*) receive a lower weight than features that are more frequent in dogs (*small* and *tail-wagging*). This differs from weight development in object-first training (Fig. 5), where weights correspond to the relevance of features for discrimination (in that case, size features are less relevant than the other features)

**Fig. 7**
Outcome activations after a) object-first and b) label-first training on dog-rabbit example 1 (see Section “?? ??”). When objects precede labels in training (a), dogs (here shown: small, barking dogs), can be discriminated optimally: the activation of the label “dog” given a dog exemplar approaches 1 and the activation of the label “rabbit” approaches 0. However, when labels precede objects (b), optimally discriminative activations cannot be reached: given the label “dog”, dogs with most frequent features (*small* and *tail-wagging*) are expected more than dogs with less frequent features (*barking* and *big*); crucially, also rabbits are expected to a certain extent after hearing the label “dog”

**Fig. 8**
Outcome activations after a) object-first and b) label-first training on dog-rabbit example 2 (see Section “Asymmetry effects”). As opposed to example 1 (Fig. 7), misclassifications occur here after label-first training (b): after hearing a label, e.g. “dog”, low frequency exemplars of the wrong species, here, big rabbits, are expected more than low frequency exemplars of the correct species, here, small dogs. This is due to the particular kind of feature structure, in which one feature of low frequency exemplars in one species (i.e., here *big* in the species of bunnies) also occurs in high frequency exemplars of the other species (i.e., dogs)

**Fig. 9**
Different cue structures to model negative patterning, in which single stimuli predict a different outcome than their combination (a). When the stimulus compound is coded compositionally as a combination of its elements ({*Tone*, *Light*}), the two outcomes cannot be discriminated from each other (b). When the stimulus compound is coded by a single configural cue ({*LightTone*}), discrimination is optimal but not realistic (c). The combination of a configural cue and its elements ({*Tone*, *Light*, *LightTone*}) captures discrimination and generalization (d). See also the interactive interface in the tutorial

See this image and copyright information in PMC

Cited by

How trial-to-trial learning shapes mappings in the mental lexicon: Modelling lexical decision with linear discriminative learning.
Heitmeier M, Chuang YY, Baayen RH. Heitmeier M, et al. Cogn Psychol. 2023 Nov;146:101598. doi: 10.1016/j.cogpsych.2023.101598. Epub 2023 Sep 14. Cogn Psychol. 2023. PMID: 37716109 Free PMC article.
Prediction and error in early infant speech learning: A speech acquisition model.
Nixon JS, Tomaschek F. Nixon JS, et al. Cognition. 2021 Jul;212:104697. doi: 10.1016/j.cognition.2021.104697. Epub 2021 Mar 31. Cognition. 2021. PMID: 33798952 Free PMC article.
Understanding the Phonetic Characteristics of Speech Under Uncertainty-Implications of the Representation of Linguistic Knowledge in Learning and Processing.
Tomaschek F, Ramscar M. Tomaschek F, et al. Front Psychol. 2022 Apr 25;13:754395. doi: 10.3389/fpsyg.2022.754395. eCollection 2022. Front Psychol. 2022. PMID: 35548492 Free PMC article.
A cognitive modeling approach to learning and using reference biases in language.
Toth AG, Hendriks P, Taatgen NA, van Rij J. Toth AG, et al. Front Artif Intell. 2022 Nov 16;5:933504. doi: 10.3389/frai.2022.933504. eCollection 2022. Front Artif Intell. 2022. PMID: 36467560 Free PMC article.
Order Matters! Influences of Linear Order on Linguistic Category Learning.
Hoppe DB, van Rij J, Hendriks P, Ramscar M. Hoppe DB, et al. Cogn Sci. 2020 Nov;44(11):e12910. doi: 10.1111/cogs.12910. Cogn Sci. 2020. PMID: 33124103 Free PMC article.

References

1. Adi, Y., Kermany, E., Belinkov, Y., Lavi, O., & Goldberg, Y. (2016). Fine-grained analysis of sentence embeddings using auxiliary prediction tasks. arXiv:1608.04207.
1. Anderson JR. ACT: A simple theory of complex cognition. American Psychologist. 1996;51(4):355–365. doi: 10.1037/0003-066X.51.4.355. - DOI
1. Anderson JR. Human symbol manipulation within an integrated cognitive architecture. Cognitive Science. 2005;29(34):313–341. doi: 10.1207/s15516709cog0000_22. - DOI - PubMed
1. Arnold D, Tomaschek F, Sering K, Lopez F, Baayen RH. Words from spontaneous conversational speech can be recognized with human-like accuracy by an error-driven learning algorithm that discriminates between meanings straight from smart acoustic features, bypassing the phoneme as recognition unit. PloS one. 2017;12(4):e0174623. doi: 10.1371/journal.pone.0174623. - DOI - PMC - PubMed
1. Arnon I, Ramscar M. Granularity and the acquisition of grammatical gender: How order-of-acquisition affects what gets learned. Cognition. 2012;122(3):292–305. doi: 10.1016/j.cognition.2011.10.009. - DOI - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

An exploration of error-driven learning in simple two-layer networks from a discriminative learning perspective

Affiliations

An exploration of error-driven learning in simple two-layer networks from a discriminative learning perspective

Authors

Affiliations

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources