Recurrent neural networks as neuro-computational models of human speech recognition
- PMID: 40720506
- PMCID: PMC12331064
- DOI: 10.1371/journal.pcbi.1013244
Recurrent neural networks as neuro-computational models of human speech recognition
Abstract
Human speech recognition transforms a continuous acoustic signal into categorical linguistic units, by aggregating information that is distributed in time. It has been suggested that this kind of information processing may be understood through the computations of a Recurrent Neural Network (RNN) that receives input frame by frame, linearly in time, but builds an incremental representation of this input through a continually evolving internal state. While RNNs can simulate several key behavioral observations about human speech and language processing, it is unknown whether RNNs also develop computational dynamics that resemble human neural speech processing. Here we show that the internal dynamics of long short-term memory (LSTM) RNNs, trained to recognize speech from auditory spectrograms, predict human neural population responses to the same stimuli, beyond predictions from auditory features. Variations in the RNN architecture motivated by cognitive principles further improved this predictive power. Specifically, modifications that allow more human-like phonetic competition also led to more human-like temporal dynamics. Overall, our results suggest that RNNs provide plausible computational models of the cortical processes supporting human speech recognition.
Copyright: © 2025 Brodbeck et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Conflict of interest statement
The authors have declared that no competing interests exist.
Figures
References
-
- Aertsen AMHJ, Johannesma PIM, Hermes DJ. Spectro-temporal receptive fields of auditory neurons in the grassfrog: II. Analysis of the stimulus-event relation for tonal stimuli. Biol Cybern. 1980;38(4):235–48. - PubMed
-
- Singer Y, Teramoto Y, Willmore BD, Schnupp JW, King AJ, Harper NS. Sensory cortex is optimized for prediction of future input. eLife [Internet]. 2018. [cited 2019 Feb 26];7. Available from: https://elifesciences.org/articles/31557 - PMC - PubMed
MeSH terms
LinkOut - more resources
Full Text Sources
