Review

. 2020 Jun 25:14:36.

doi: 10.3389/fnbot.2020.00036. eCollection 2020.

A Path Toward Explainable AI and Autonomous Adaptive Intelligence: Deep Learning, Adaptive Resonance, and Models of Perception, Emotion, and Action

Stephen Grossberg¹

Affiliations

Affiliation

¹ Graduate Program in Cognitive and Neural Systems, Departments of Mathematics & Statistics, Psychological & Brain Sciences, and Biomedical Engineering, Center for Adaptive Systems, Boston University, Boston, MA, United States.

PMID: 32670045
PMCID: PMC7330174
DOI: 10.3389/fnbot.2020.00036

Review

A Path Toward Explainable AI and Autonomous Adaptive Intelligence: Deep Learning, Adaptive Resonance, and Models of Perception, Emotion, and Action

Stephen Grossberg. Front Neurorobot. 2020.

. 2020 Jun 25:14:36.

doi: 10.3389/fnbot.2020.00036. eCollection 2020.

Author

Stephen Grossberg¹

Affiliation

¹ Graduate Program in Cognitive and Neural Systems, Departments of Mathematics & Statistics, Psychological & Brain Sciences, and Biomedical Engineering, Center for Adaptive Systems, Boston University, Boston, MA, United States.

PMID: 32670045
PMCID: PMC7330174
DOI: 10.3389/fnbot.2020.00036

Abstract

Biological neural network models whereby brains make minds help to understand autonomous adaptive intelligence. This article summarizes why the dynamics and emergent properties of such models for perception, cognition, emotion, and action are explainable, and thus amenable to being confidently implemented in large-scale applications. Key to their explainability is how these models combine fast activations, or short-term memory (STM) traces, and learned weights, or long-term memory (LTM) traces. Visual and auditory perceptual models have explainable conscious STM representations of visual surfaces and auditory streams in surface-shroud resonances and stream-shroud resonances, respectively. Deep Learning is often used to classify data. However, Deep Learning can experience catastrophic forgetting: At any stage of learning, an unpredictable part of its memory can collapse. Even if it makes some accurate classifications, they are not explainable and thus cannot be used with confidence. Deep Learning shares these problems with the back propagation algorithm, whose computational problems due to non-local weight transport during mismatch learning were described in the 1980s. Deep Learning became popular after very fast computers and huge online databases became available that enabled new applications despite these problems. Adaptive Resonance Theory, or ART, algorithms overcome the computational problems of back propagation and Deep Learning. ART is a self-organizing production system that incrementally learns, using arbitrary combinations of unsupervised and supervised learning and only locally computable quantities, to rapidly classify large non-stationary databases without experiencing catastrophic forgetting. ART classifications and predictions are explainable using the attended critical feature patterns in STM on which they build. The LTM adaptive weights of the fuzzy ARTMAP algorithm induce fuzzy IF-THEN rules that explain what feature combinations predict successful outcomes. ART has been successfully used in multiple large-scale real world applications, including remote sensing, medical database prediction, and social media data clustering. Also explainable are the MOTIVATOR model of reinforcement learning and cognitive-emotional interactions, and the VITE, DIRECT, DIVA, and SOVEREIGN models for reaching, speech production, spatial navigation, and autonomous adaptive intelligence. These biological models exemplify complementary computing, and use local laws for match learning and mismatch learning that avoid the problems of Deep Learning.

Keywords: Adaptive Resonance Theory; arm and speech movement; category learning; consciousness; deep learning; emotion; explainable AI; visual boundaries and surfaces.

PubMed Disclaimer

Figures

**Figure 1**
Circuit diagram of the back propagation model. Input vector a_i in level F₁ sends a sigmoid signal S_i = f(a_i) that is multiplied by learned weights w_ij on their way to level F₂. These LTM-weighted signals are added together at F₂ with a bias term θ_j to define x_j. A sigmoid signal S_j = f(x_j) then generates outputs from F₂ that activate two pathways. One pathway inputs to a Differentiator. The other pathway gets multiplied by adaptive weight w_jk on the way to level F₃. At level F₃, the weighted signals are added together with a bias term θ_k to define x_k. A sigmoid signal S_k = f(x_k) from F₃ defines the Actual Output of the system. This Actual Output S_k is subtracted from a Target Output b_k via a back-coupled error correction step. The difference b_k – S_k is also multiplied by the term f′(x_k) that is computed at the Differentiator from level F₃. One function of the Differentiator step is to ensure that the activities and weights remain in a bounded range, because if x_k grows too large, then f′(x_k) approaches zero. The net effect of these operations is to compute the Error δ_k = f′(x_k)(b_k – S_k) that sends a top-down output signal to the level just below it. On the way, each δ_k is multiplied by the bottom-up learned weights w_jk at F₃. These weights reach the pathways that carry δ_k via the process of *weight transport*. Weight transport is clearly a non-local operation relative to the network connections that carry locally computed signals. All the δ_k are multiplied by the transported weights w_jk and added. This sum is multiplied by another Differentiator term f′(x_i) from level F₂ to keep the resultant product δ_j bounded. δ_j is then back-coupled to adjust all the weights w_ij in pathways from level F₁ to F₂ [figure reprinted and text adapted with permission from Carpenter (1989)].

**Figure 2**
The ART Matching Rule circuit enables bottom-up inputs to fire their target cells, top-down expectations to provide excitatory modulation of cells in their on-center while inhibiting cells in their off-surround, and a convergence of bottom-up and top-down signals to generate an attentional focus at matched cells while continuing to inhibit unmatched cells in the off-surround [adapted with permission from Grossberg (2017b)].

**Figure 3**
The ART hypothesis testing and learning cycle whereby bottom-up input patterns that are sufficiently mismatched by their top-down expectations can drive hypothesis testing and memory search leading to discovery of recognition categories that can match the bottom-up input pattern well-enough to trigger resonance and learning. See the text for details [adapted with permission from Carpenter and Grossberg (1988)].

**Figure 4**
When a good enough match occurs between a bottom-up input pattern and top-down expectation, a feature-category resonance is triggered the synchronizes, amplifies, and prolongs the STM activities of the cells that participate in the resonance, while also selecting an attentional focus and triggering learning in the LTM traces in the active bottom-up adaptive filter and top-down expectation pathways to encode the resonating attended data [adapted with permission from Grossberg (2017b)].

**Figure 5**
When the ART Matching Rule is eliminated by deleting an ART circuit's top-down expectations from the ART 1 model, the resulting competitive learning network experiences catastrophic forgetting even if it tries to learn any of arbitrarily many lists consisting of just four input vectors A, B, C, and D when they are presented repeatedly in the order ABCAD, assuming that the input vectors satisfy the constraints shown in the figure [adapted with permission from Carpenter and Grossberg (1987)].

**Figure 6**
These computer simulations illustrate how **(A)** unstable learning and **(B)** stable learning occur in response to a particular sequence of input vectors A, B, C, D when they are presented repeatedly in the order ABCAD to an ART 1 model. Unstable learning with catastrophic forgetting of the category that codes vector A occurs when no top-down expectations exist, as illustrated by its periodic recoding by categories 1 and 2 on each learning trial. See the text for details [adapted with permission from Carpenter and Grossberg (1987)].

**Figure 7**
These computer simulations show how the alphabet A, B, C, … is learned by the ART 1 when vigilance is chosen to equal **(A)** 0.5, or **(B)** 0.8. Note that more categories are learned in **(B)** and that their learned prototypes more closely represent the letters that they categorize. Thus, higher vigilance leads to the learning of more concrete categories. See the text for details [reprinted with permission from Carpenter and Grossberg (1987)].

**Figure 8**
The fuzzy ARTMAP architecture can learn recognition categories in both ART_a and ART_b by unsupervised learning, as well as an associative map via the map field from ART_a to ART_b by supervised learning. See the text for details [adapted with permission from Carpenter et al. (1992)].

**Figure 9**
**(A)** A prediction from ART_a to ART_b can be made if the analog match between bottom-up and top-down patterns exceeds the current vigilance value. **(B)** If a mismatch occurs between the prediction at ART_b and the correct output pattern, then a match tracking signal can increase vigilance just enough to drive hypothesis testing and memory search for a better-matching category at ART_a. Matching tracking hereby sacrifices the minimum amount of generalization necessary to correct the predictive error [adapted with permission from Carpenter and Grossberg (1992)].

**Figure 10**
Spatially abutting and collinear boundary contour (BC) and feature contour (FC) signals in a Filling-In-DOmain, or FIDO, can trigger depth-selective filling-in of the color carried by the FC signal in that FIDO. See the text for details [adapted with permission from Grossberg and Zajac (2017)].

**Figure 11**
**(A)** Object categories are activated by visual or gustatory inputs in anterior inferotemporal (ITA) and rhinal (RHIN) cortices, respectively. Value categories represent the value of anticipated outcomes on the basis of current hunger and satiety inputs in amygdala (AMYG) and lateral hypothalamus (LH). Object-value categories occur in the lateral orbitofrontal (ORB) cortex, for visual stimuli, and the medial orbitofrontal (MORB) cortex, for gustatory stimuli. They use the learned value of perceptual stimuli to choose the most valued stimulus in the current context. The reward expectation filter in the basal ganglia detects the omission or delivery of rewards using a circuit that spans ventral striatum (VS), ventral pallidum (VP), striosomal delay (SD) cells in the ventral striatum, the pedunculopontine nucleus (PPTN), and midbrain dopaminergic neurons of the substantia nigra pars compacta/ventral tegmental area (SNc/VTA). **(B)** Reciprocal excitatory signals from hypothalamic drive-taste cells to amygdala value category cells can drive the learning of a value category that selectively fires in response to a particular hypothalamic homeostatic activity pattern. See the text for details [adapted with permission from Dranias et al. (2008)].

**Figure 12**
The Vector Integration to Endpoint, or VITE, model of Bullock and Grossberg (1988) realize the Three S's of arm movement control: Synergy, Synchrony, and Speed. See the text for details [adapted with permission from Bullock and Grossberg (1988)].

**Figure 13**
(Top half) Neurophysiological data of vector cell responses in motor cortex. (Bottom half) VITE model simulations of a simple arm movement in which the model's difference vector D simulates the data as an emergent property of network interactions [data of Georgopoulos et al. (1982) and Bullock and Grossberg (1988) are reproduced with permission. Figure as a whole is reprinted with permission from Grossberg (2020)].

**Figure 14**
The DIRECT and DIVA models have homologous circuits to learn and control motor-equivalent reaching and speaking. Tool use and coarticulation are among the resulting useful motor-equivalent properties [reprinted with permission from Grossberg (2020)].

See this image and copyright information in PMC

References

1. Amari S. I. (1972). Characteristics of random nets of analog neuron-like elements. Trans. Syst. Man. Cybern. 2, 643–657. 10.1109/TSMC.1972.4309193 - DOI
1. Bellmann A., Meuli R., Clarke S. (2001). Two types of auditory neglect. Brain 124, 676–687. 10.1093/brain/124.4.676 - DOI - PubMed
1. Bregman A. S. (1990). Auditory Scene Analysis: The Perceptual Organization of Sound. Cambridge, MA: MIT Press.
1. Brown J. W., Bullock D., Grossberg S. (1999). How the basal ganglia use parallel excitatory and inhibitory learning pathways to selectively respond to unexpected rewarding cues. J. Neurosci. 19, 10502–10511. 10.1523/JNEUROSCI.19-23-10502.1999 - DOI - PMC - PubMed
1. Brown J. W., Bullock D., Grossberg S. (2004). How laminar frontal cortex and basal ganglia circuits interact to control planned and reactive saccades. Neural Netw. 17, 471–510. 10.1016/j.neunet.2003.08.006 - DOI - PubMed

Publication types

Actions

LinkOut - more resources

Full Text Sources
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

A Path Toward Explainable AI and Autonomous Adaptive Intelligence: Deep Learning, Adaptive Resonance, and Models of Perception, Emotion, and Action

Affiliation

A Path Toward Explainable AI and Autonomous Adaptive Intelligence: Deep Learning, Adaptive Resonance, and Models of Perception, Emotion, and Action

Author

Affiliation

Abstract

Figures

References

Publication types

LinkOut - more resources

Full Text Sources

Miscellaneous