Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Feb 28;343(6174):1006-10.
doi: 10.1126/science.1245994. Epub 2014 Jan 30.

Phonetic feature encoding in human superior temporal gyrus

Affiliations

Phonetic feature encoding in human superior temporal gyrus

Nima Mesgarani et al. Science. .

Abstract

During speech perception, linguistic elements such as consonants and vowels are extracted from a complex acoustic speech signal. The superior temporal gyrus (STG) participates in high-order auditory processing of speech, but how it encodes phonetic information is poorly understood. We used high-density direct cortical surface recordings in humans while they listened to natural, continuous speech to reveal the STG representation of the entire English phonetic inventory. At single electrodes, we found response selectivity to distinct phonetic features. Encoding of acoustic properties was mediated by a distributed population response. Phonetic features could be directly related to tuning for spectrotemporal acoustic cues, some of which were encoded in a nonlinear fashion or by integration of multiple cues. These findings demonstrate the acoustic-phonetic representation of speech in human STG.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1. Human STG cortical selectivity to speech sounds
(A) Magnetic resonance image surface reconstruction of one participant's cerebrum. Electrodes (red) are plotted with opacity signifying the t test value when comparing responses to silence and speech (P < 0.01, t test). (B) Example sentence and its acoustic waveform, spectrogram, and phonetic transcription. (C) Neural responses evoked by the sentence at selected electrodes. z score indicates normalized response. (D) Average responses at five example electrodes to all English phonemes and their PSI vectors.
Fig. 2
Fig. 2. Hierarchical clustering of single-electrode and population responses
(A) PSI vectors of selective electrodes across all participants. Rows correspond to phonemes, and columns correspond to electrodes. (B) Clustering across population PSIs (rows). (C) Clustering across single electrodes (columns). (D) Alternative PSI vectors using rows now corresponding to phonetic features, not phonemes. (E) Weighted average STRFs of main electrode clusters. (F) Average acoustic spectrograms for phonemes in each population cluster. Correlation between average STRFs and average spectrograms: r = 0.67, P < 0.01, t test. (r = 0.50, 0.78, 0.55, 0.86, 0.86, and 0.47 for plosives, fricatives, vowels, and nasals, respectively; P < 0.01, t test).
Fig. 3
Fig. 3. Neural encoding of vowels
(A) Formant frequencies, F1 and F2, for English vowels (F2-F1, dashed line, first principal component). (B) F1 and F2 partial correlations for each electrode's response (**P < 0.01, t test). Dots (electrodes) are color-coded by their cluster membership. (C) Neural population decoding of fundamental and formant frequencies. Error bars indicate SEM. (D) Multidimensional scaling (MDS) of acoustic and neural space (***P < 0.001, t test).
Fig. 4
Fig. 4. Neural encoding of plosive and fricative phonemes
(A) Prediction accuracy of plosive and fricative acoustic parameters from neural population responses. Error bars indicate SEM. (B) Response of three example electrodes to all plosive phonemes sorted by VOT. (C) Nonlinearity of VOT-response transformation and (D) distributions of nonlinearity for all plosive-selective electrodes identified in Fig. 2D. Voiced plosive-selective electrodes are shown in pink, and the rest in gray. (E) Partial correlation values between response of electrodes and acoustic parameters shared between plosives and fricatives (**P < 0.01, t test). Dots (electrodes) are color-coded by their cluster grouping from Fig. 2C.

Comment in

Similar articles

Cited by

References

    1. Chomsky N, Halle M. The Sound Pattern of English. Harper and Row; New York: 1968.
    1. Binder JR, et al. Cereb Cortex. 2000;10:512–528. - PubMed
    1. Boatman D, Hall C, Goldstein MH, Lesser R, Gordon B. Cortex. 1997;33:83–98. - PubMed
    1. Chang EF, et al. Nat Neurosci. 2010;13:1428–1432. - PMC - PubMed
    1. Formisano E, De Martino F, Bonte M, Goebel R. Science. 2008;322:970–973. - PubMed

Publication types