Active listening
- PMID: 32732017
- PMCID: PMC7812378
- DOI: 10.1016/j.heares.2020.107998
Active listening
Abstract
This paper introduces active listening, as a unified framework for synthesising and recognising speech. The notion of active listening inherits from active inference, which considers perception and action under one universal imperative: to maximise the evidence for our (generative) models of the world. First, we describe a generative model of spoken words that simulates (i) how discrete lexical, prosodic, and speaker attributes give rise to continuous acoustic signals; and conversely (ii) how continuous acoustic signals are recognised as words. The 'active' aspect involves (covertly) segmenting spoken sentences and borrows ideas from active vision. It casts speech segmentation as the selection of internal actions, corresponding to the placement of word boundaries. Practically, word boundaries are selected that maximise the evidence for an internal model of how individual words are generated. We establish face validity by simulating speech recognition and showing how the inferred content of a sentence depends on prior beliefs and background noise. Finally, we consider predictive validity by associating neuronal or physiological responses, such as the mismatch negativity and P300, with belief updating under active listening, which is greatest in the absence of accurate prior beliefs about what will be heard next.
Keywords: Audition; Segmentation; Variational Bayes; Voice; active inference; active listening; speech recognition.
Copyright © 2020 The Authors. Published by Elsevier B.V. All rights reserved.
Conflict of interest statement
Declaration of competing interest The authors have no disclosures or conflict of interest.
Figures












Similar articles
-
Extrinsic Cognitive Load Impairs Spoken Word Recognition in High- and Low-Predictability Sentences.Ear Hear. 2018 Mar/Apr;39(2):378-389. doi: 10.1097/AUD.0000000000000493. Ear Hear. 2018. PMID: 28945658 Free PMC article.
-
Some Neurocognitive Correlates of Noise-Vocoded Speech Perception in Children With Normal Hearing: A Replication and Extension of ).Ear Hear. 2017 May/Jun;38(3):344-356. doi: 10.1097/AUD.0000000000000393. Ear Hear. 2017. PMID: 28045787 Free PMC article.
-
Tracking Cognitive Spare Capacity During Speech Perception With EEG/ERP: Effects of Cognitive Load and Sentence Predictability.Ear Hear. 2020 Sep/Oct;41(5):1144-1157. doi: 10.1097/AUD.0000000000000856. Ear Hear. 2020. PMID: 32282402 Free PMC article.
-
Effects of Noise and a Speaker's Impaired Voice Quality on Spoken Language Processing in School-Aged Children: A Systematic Review and Meta-Analysis.J Speech Lang Hear Res. 2022 Jan 12;65(1):169-199. doi: 10.1044/2021_JSLHR-21-00183. Epub 2021 Dec 13. J Speech Lang Hear Res. 2022. PMID: 34902257
-
Generative models for sequential dynamics in active inference.Cogn Neurodyn. 2024 Dec;18(6):3259-3272. doi: 10.1007/s11571-023-09963-x. Epub 2023 Apr 26. Cogn Neurodyn. 2024. PMID: 39712086 Free PMC article. Review.
Cited by
-
Rapid adaptation of predictive models during language comprehension: Aperiodic EEG slope, individual alpha frequency and idea density modulate individual differences in real-time model updating.Front Psychol. 2022 Aug 26;13:817516. doi: 10.3389/fpsyg.2022.817516. eCollection 2022. Front Psychol. 2022. PMID: 36092106 Free PMC article.
-
Fast frequency modulation is encoded according to the listener expectations in the human subcortical auditory pathway.Imaging Neurosci (Camb). 2024 Sep 19;2:imag-2-00292. doi: 10.1162/imag_a_00292. eCollection 2024. Imaging Neurosci (Camb). 2024. PMID: 40800274 Free PMC article.
-
Phase Alignment of Low-Frequency Neural Activity to the Amplitude Envelope of Speech Reflects Evoked Responses to Acoustic Edges, Not Oscillatory Entrainment.J Neurosci. 2023 May 24;43(21):3909-3921. doi: 10.1523/JNEUROSCI.1663-22.2023. Epub 2023 Apr 26. J Neurosci. 2023. PMID: 37185238 Free PMC article.
-
A deep hierarchy of predictions enables online meaning extraction in a computational model of human speech comprehension.PLoS Biol. 2023 Mar 22;21(3):e3002046. doi: 10.1371/journal.pbio.3002046. eCollection 2023 Mar. PLoS Biol. 2023. PMID: 36947552 Free PMC article.
-
Active inference on discrete state-spaces: A synthesis.J Math Psychol. 2020 Dec;99:102447. doi: 10.1016/j.jmp.2020.102447. J Math Psychol. 2020. PMID: 33343039 Free PMC article. Review.
References
-
- Abberton E., Fourcin A.J. Intonation and speaker identification. Lang. Speech. 1978;21(4):305–318. - PubMed
-
- Altenberg E.P. The perception of word boundaries in a second language. Sec. Lang. Res. 2005;21(4):325–358.
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
Miscellaneous