Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Dec 17;28(24):3976-3983.e5.
doi: 10.1016/j.cub.2018.10.042. Epub 2018 Nov 29.

Rapid Transformation from Auditory to Linguistic Representations of Continuous Speech

Affiliations

Rapid Transformation from Auditory to Linguistic Representations of Continuous Speech

Christian Brodbeck et al. Curr Biol. .

Abstract

During speech perception, a central task of the auditory cortex is to analyze complex acoustic patterns to allow detection of the words that encode a linguistic message [1]. It is generally thought that this process includes at least one intermediate, phonetic, level of representations [2-6], localized bilaterally in the superior temporal lobe [7-9]. Phonetic representations reflect a transition from acoustic to linguistic information, classifying acoustic patterns into linguistically meaningful units, which can serve as input to mechanisms that access abstract word representations [10, 11]. While recent research has identified neural signals arising from successful recognition of individual words in continuous speech [12-15], no explicit neurophysiological signal has been found demonstrating the transition from acoustic and/or phonetic to symbolic, lexical representations. Here, we report a response reflecting the incremental integration of phonetic information for word identification, dominantly localized to the left temporal lobe. The short response latency, approximately 114 ms relative to phoneme onset, suggests that phonetic information is used for lexical processing as soon as it becomes available. Responses also tracked word boundaries, confirming previous reports of immediate lexical segmentation [16, 17]. These new results were further investigated using a cocktail-party paradigm [18, 19] in which participants listened to a mix of two talkers, attending to one and ignoring the other. Analysis indicates neural lexical processing of only the attended, but not the unattended, speech stream. Thus, while responses to acoustic features reflect attention through selective amplification of attended speech, responses consistent with a lexical processing model reveal categorically selective processing.

Keywords: cohort entropy; cohort model; magentoencephalography; phoneme surprisal; temporal response function.

PubMed Disclaimer

Conflict of interest statement

Declaration of Interests

The authors declare no competing interests.

Figures

Figure 1.
Figure 1.. Analysis framework, illustrated with an excerpt from one of the stimuli.
The acoustic waveform (top row) is shown for reference only. Subsequent rows show the predictor variables used to model responses to a single speaker. Acoustic predictors were based on an auditory spectrogram aggregated into 8 frequency bands. For the phoneme-based predictor variables, the initial phoneme of each word is drawn in black, whereas all subsequent phonemes are drawn in blue. The last row contains estimated brain responses from three virtual current dipoles, representative of the modeled signal. The anatomical plot of the cortex is shaded to indicate the temporal lobe, the anatomical region of interest (only the left hemisphere is shown, but both hemispheres were analyzed). See Table S1 for correlations between different predictor variables, and Figure S1 for corresponding scatter-plots (of the phoneme-based predictor variables).
Figure 2.
Figure 2.. Brain responses to single speaker.
Left column: significant predictive power (p ≤ .05, corrected). Colors reflect the difference in z-transformed correlation between the full and the appropriately shuffled model. Color-maps are normalized for each predictor to maximize visibility of internal structure, as appropriate for evaluating source localization results: due to spatial dispersion of minimum norm source estimates, effect peaks are relatively accurate estimates, but strong effects can cause spurious spread whose amplitude decreases with distance from the peak. See also Table S2. Right column: Temporal response functions (TRFs) estimated for the reduced model. Each line reflects the TRF at one virtual current dipole, with color coding its location by hemisphere, and saturation coding significance (p ≤ .05, corrected). Anatomical plots display TRFs at certain time points of interest (only significant values are shown), with color coding current direction relative to the cortical surface. Acoustic TRFs were averaged across frequency band for display as visual inspection revealed no major differences apart from amplitude differences between frequency bands. See also Figure S2 and Table S3.
Figure 3.
Figure 3.. Brain responses to two concurrent speakers.
Details analogous to Figure 2. The three columns display results for the model components for: the attended speech stream (left), the actual acoustic stimulus mixture (middle), and the unattended speech stream (right). The upper part of the figure displays results for acoustic features, the lower part for lexical processing.
Figure 4.
Figure 4.. Summary of results.
A) Illustration of aspects of the cohort model on which significant variables were based: lexical segmentation (word onset), predictive coding based on preceding phoneme sequence (phoneme surprisal) and lexical competition (cohort entropy). B) Time course of TRF amplitude for each variable, major peaks marked with symbols corresponding to those used in C. C) Center of mass of average peaks shown in B (see also Table S3 and Figure S2). D) Schematic illustration of results of the two-speaker analysis: early acoustic TRF peaks track the processing of the acoustic signal from both speakers, whereas lexical TRFs track processing of only the attended speech.

Comment in

References

    1. McQueen JM (2007). Eight questions about spoken word recognition. In The Oxford Handbook of Psycholinguistics, Gaskell MG, ed., pp. 37–53.
    1. Kazanina N, Bowers JS, and Idsardi W (2018). Phonemes: Lexical access and beyond. Psychon. Bull. Rev 25, 560–585. - PMC - PubMed
    1. Phillips C, Pellathy T, Marantz A, Yellin E, Wexler K, Poeppel D, McGinnis M, and Roberts TPL (2000). Auditory Cortex Accesses Phonological Categories: An MEG Mismatch Study. J. Cogn. Neurosci 12, 1038–1055. - PubMed
    1. Stevens KN (2002). Toward a model for lexical access based on acoustic landmarks and distinctive features. J. Acoust. Soc. Am 111, 1872–1891. - PubMed
    1. Marslen-Wilson W, and Warren P (1994). Levels of perceptual representation and process in lexical access: Words, phonemes, and features. Psychol. Rev 101, 653–675. - PubMed

Publication types

LinkOut - more resources