Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Feb;46(2):e13110.
doi: 10.1111/cogs.13110.

The Presence of Background Noise Extends the Competitor Space in Native and Non-Native Spoken-Word Recognition: Insights from Computational Modeling

Affiliations

The Presence of Background Noise Extends the Competitor Space in Native and Non-Native Spoken-Word Recognition: Insights from Computational Modeling

Themis Karaminis et al. Cogn Sci. 2022 Feb.

Abstract

Oral communication often takes place in noisy environments, which challenge spoken-word recognition. Previous research has suggested that the presence of background noise extends the number of candidate words competing with the target word for recognition and that this extension affects the time course and accuracy of spoken-word recognition. In this study, we further investigated the temporal dynamics of competition processes in the presence of background noise, and how these vary in listeners with different language proficiency (i.e., native and non-native) using computational modeling. We developed ListenIN (Listen-In-Noise), a neural-network model based on an autoencoder architecture, which learns to map phonological forms onto meanings in two languages and simulates native and non-native spoken-word comprehension. We also examined the model's activation states during online spoken-word recognition. These analyses demonstrated that the presence of background noise increases the number of competitor words, which are engaged in phonological competition and that this happens in similar ways intra and interlinguistically and in native and non-native listening. Taken together, our results support accounts positing a "many-additional-competitors scenario" for the effects of noise on spoken-word recognition.

Keywords: Competitor space; Computational modeling; Deep neural networks; Neurocomputational model; Noise; Non-native listening; Phonological competition; Spoken-word recognition.

PubMed Disclaimer

Figures

Fig 1
Fig 1
ListenIN's architecture and modeling of spoken‐word recognition. (a) During training, the autoencoder architecture is trained on representations of phonological forms and/or semantics. (b) ListenIN is tested on spoken‐word recognition. Testing focuses on the spread of activation in the parts of the network shown with black color (and ignores gray parts).
Fig 2
Fig 2
Visualization of words in the training set of ListenIN based on the similarity of their phonological representations. Visualization was made using the t‐SNE technique (t‐Distributed Stochastic Neighbour Embedding; van der Maaten & Hinton, 2012). Black ink shows English words; gray ink shows Dutch words. Rectangles indicate examples of clusters of phonologically related words.
Fig 3
Fig 3
Visualization of the semantic representations for the words in the training set of ListenIN, using the t‐SNE method (van der Maaten & Hinton, 2012). Rectangles indicate examples of clusters of semantically related words.
Fig 4
Fig 4
Target empirical data (panels a and d) and modeling results from ListenIN and the Input‐based model (panels b, c, e, and f) in offline spoken‐word identification. The top panels (a–c) present overall accuracy for different noise intensities; the bottom panels (d–f) present the number of unique misperception errors for different noise intensities. Noise intensity refers to SNR values for speech‐shaped noise in the human data; and to the standard deviation of the added noise in ListenIN and the Input‐based model. Black ink shows the performance of native listeners; gray ink shows the performance of non‐native listeners (black and gray lines overlap in c and e). Thick lines correspond to the word‐initial noise condition; thin lines correspond to the word‐final noise condition. Error bars show 1 SEM.
Fig 5
Fig 5
Target empirical data (panels a and e) and modeling results from ListenIN (panels b and d) and the Input‐based model (panels c and f) in online spoken‐word identification. The top panels (a–c) present looking preferences for target words; the bottom panels (d and f) present looking preferences for onset phonological competitors. Time is measured in ms in the human data and in the number of phonemes that have been incrementally presented in the modeling results. Black ink shows the performance of native listeners; gray ink shows the performance of non‐native listeners (black and gray lines overlap in panels c and f). Continuous lines correspond to the clean listening condition; dashed and dotted lines correspond to the noisy listening condition (SNR of +3 and −3 dB in humans, added random noise with a standard deviation SD = 0.25 and SD = 0.75 in ListenIN, and SD = 0.60 and 1.20 for the Input‐based model––settings from simulation A).
Fig 6
Fig 6
Analysis of ListenIN's performance during online spoken‐word recognition. Panel a shows online accuracy in spoken‐word recognition; panel b shows the cumulative number of unique erroneous responses per spoken word; and panel c shows the rank order of similarity to target‐word output semantics activation. Panels d–f refer to activation states in the input, the composite hidden (PS2), and the semantics hidden S4 layer, correspondingly, and show the rank order for similarity to target‐word internal activation patterns of Dutch words. Panels i–g show variants of the rank‐order measures in panels d–f (correspondingly) referring to interlinguistic competition. Black ink corresponds to the native Dutch ListenIN and gray ink to the non‐native ListenIN. Continuous lines correspond to the clean listening condition and dotted lines to the presence of random noise (SD = 0.75).
Fig 7
Fig 7
Relationship between representational similarity in ListenIN's and the similarity structure in the phonological and semantics representations. The colored squares show the strength of the correlation coefficients between the (representational dissimilarity matrices) RDMs for different layers in ListenIN and the RDMs for the training set's phonological and semantic representations. Each panel shows correlation coefficients for timesteps 1–5 of the incremental presentation of input phonology and for clean and noisy (SD = 0.75) listening conditions (top and bottom row in each panel). Warmer colors suggest higher similarity and hence that activation states in ListenIN are consistent with regularities embedded in the model's representations (phonological or semantics) to a greater extent. (a) Native ListenIN; (b) non‐native ListenIN. The two columns of panels in subplots a and b show comparisons with phonological and semantics RDMs, correspondingly. Rows correspond to different layers of the neural network architecture (starting from the bottom: input, hidden layer P0, hidden layer PS2 [composite], hidden layer S4, and output layer).

References

    1. Allopenna, P. D. , Magnuson, J. S. , & Tanenhaus, M. K. (1998). Tracking the time course of spoken word recognition using eye movements: Evidence for continuous mapping models. Journal of Memory and Language, 38, 419–439.
    1. Baayen, R.H. , Davidson, D.J. , Bates, D.M. (2008). Mixed‐effects modeling with crossed random effects for subjects and items. Journal of Memory and Language, 59(4), 390–412. 10.1016/j.jml.2007.12.005 - DOI
    1. Ben‐David, B. M. , Chambers, C. G. , Daneman, M. , Pichora‐Fuller, M. K. , Reingold, E. M. , & Schneider, B. A. (2011). Effects of aging and noise on real‐time spoken word recognition: Evidence from eye movements. Journal of Speech Language and Hearing Research, 54, 243–262. - PubMed
    1. Broersma, M. (2012). Increased lexical activation and reduced competition in second‐language listening. Language and Cognitive Processes, 27(7‐8), 1205–1224. 10.1080/01690965.2012.660170 - DOI
    1. Brouwer, S. , & Bradlow, A. R. (2016). The temporal dynamics of spoken word recognition in adverse listening conditions. Journal of Psycholinguistic Research, 45, 1151–1160. - PMC - PubMed

Publication types