Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Jan 14;35(2):634-42.
doi: 10.1523/JNEUROSCI.2454-14.2015.

Distributed neural representations of phonological features during speech perception

Affiliations

Distributed neural representations of phonological features during speech perception

Jessica S Arsenault et al. J Neurosci. .

Abstract

A fundamental goal of the human auditory system is to map complex acoustic signals onto stable internal representations of the basic sound patterns of speech. Phonemes and the distinctive features that they comprise constitute the basic building blocks from which higher-level linguistic representations, such as words and sentences, are formed. Although the neural structures underlying phonemic representations have been well studied, there is considerable debate regarding frontal-motor cortical contributions to speech as well as the extent of lateralization of phonological representations within auditory cortex. Here we used functional magnetic resonance imaging (fMRI) and multivoxel pattern analysis to investigate the distributed patterns of activation that are associated with the categorical and perceptual similarity structure of 16 consonant exemplars in the English language used in Miller and Nicely's (1955) classic study of acoustic confusability. Participants performed an incidental task while listening to phonemes in the MRI scanner. Neural activity in bilateral anterior superior temporal gyrus and supratemporal plane was correlated with the first two components derived from a multidimensional scaling analysis of a behaviorally derived confusability matrix. We further showed that neural representations corresponding to the categorical features of voicing, manner of articulation, and place of articulation were widely distributed throughout bilateral primary, secondary, and association areas of the superior temporal cortex, but not motor cortex. Although classification of phonological features was generally bilateral, we found that multivariate pattern information was moderately stronger in the left compared with the right hemisphere for place but not for voicing or manner of articulation.

Keywords: auditory cortex; fMRI; motor cortex; multivoxel pattern analysis; phonological features; speech perception.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Temporal elements of phonological features. The role of the three main temporal cues for speech—envelope, periodicity, and fine structure—in the perception of distinctive features. Each element's dominant fluctuation rate is listed in hertz and increases in frequency from left to right. The size of the diamonds indicates the extent to which a particular element contributes to a particular phonological feature, with a blank space indicating very weak or nonexistent cues. While all three temporal elements are relevant to voicing and manner of articulation, the envelope (slowest dominant fluctuation rate) is particularly informative of manner, and periodicity (intermediate dominant fluctuation rate) is particularly informative of voicing. Spectral shape, which is characterized by fine structure (fastest dominant fluctuation rate), is the primary acoustic cue for place. Adapted from Rosen (1992).
Figure 2.
Figure 2.
Task design. Two sound trials (1300 ms each) and one silent trial (4000 ms) are depicted in slides with crosshairs. Participants hear a single CV during every sound trial and must respond with the gender of the speaker. Silent trials contain only a visual crosshair, thus requiring no response and allowing the hemodynamic response function to return to homeostatic levels.
Figure 3.
Figure 3.
Multidimensional scaling of 16 consonants. Two-dimensional spatial representation of 16 phonemes based on pooled data from Miller and Nicely (1955). Voicing is represented on the x-axis, with unvoiced phonemes occupying the left side and voiced phonemes occupying the right side. Manner of articulation is approximately represented on the y-axis, with nasals occupying the top half of the figure and fricatives and stops occupying the bottom. A clear division is observed between the unvoiced stops and fricatives, while overlap exists between voiced stops and fricatives. /θ/, “th” in thumb; /∫/, “sh” in shoe; /ð/, “th” in that; /ʒ/, “s” in measure.
Figure 4.
Figure 4.
ROIs in speech-relevant areas of the temporal and frontal lobes. An ROI mask was created by intersecting a Neurosynth automated meta-analysis (search term: “speech”; Yarkoni et al., 2011) and the 148 Freesurfer ROIs (aparc 2009 atlas; Destrieux et al., 2010). The final set of 38 ROIs (19 left, 19 right) spanned cortical areas in the temporal and frontal lobes important for the perception and production of speech.
Figure 5.
Figure 5.
Brain activity related to MDS dimensions. Activation maps of MDS results rendered onto MNI template showing clusters of >13 voxels at p < 0.005. The figures on the top display clusters correlating with positive values on Dimension 1 (approximately corresponding to the continuum of voicing information), with activation in bilateral anterior STG; the bottom images display clusters correlating with positive values on Dimension 2 (approximately corresponding to manner of articulation), with activation in bilateral HG and right PT. Both sagittal images represent right hemisphere templates.
Figure 6.
Figure 6.
ROIs with significant classification accuracy in ≥3 categories. A, The number of categories for which each ROI obtained significant (equivalent to a p value < 0.0083) classification accuracy is displayed on the y-axis, with ROIs listed across the x-axis. Gray represents voicing information, warm colors indicate place, and cool colors indicate manner of articulation. B, ROIs associated with significant classification in ≥3 categories projected onto an inflated surface. Colors represent number of phonological categories significantly classified per region (note that the scale is not continuous). Vcd, Voiced; Vel, velar; Pal, palatoalveolar; Lab, labial; Dent, dental; Alv, alveolar; Nas, nasal; Fric, fricative; 2, inferior insula; 3, HG; 4, transverse temporal sulcus; 5, STG; 7, middle temporal gyrus; 8, PT; 9, posterior lateral fissure; 11, subcentral gyrus/sulcus; 16, inferior opercular gyrus.
Figure 7.
Figure 7.
Brain activity related to phonological features. Overlapping information of voicing (green), place (red), and manner (blue) in ROIs within the speech mask projected onto inflated surface. Notably, all three phonological features are represented in right STG as well as left HG (yellow). 2, Inferior insula; 3, HG; 4, transverse temporal sulcus; 5, STG; 8, PT; 9, posterior lateral fissure; 11, subcentral gyrus/sulcus.
Figure 8.
Figure 8.
Hemispheric differences (left–right) in classification accuracy for place of articulation across nine auditory cortex ROIs. Although the general tendency reflects a leftward bias, only lateral fissure achieved significance (p < 0.05 one-tailed) on individual t tests. Error bars represent SEM. LF, Lateral fissure; TTS, transverse temporal sulcus; MTG, middle temporal gyrus; PP, planum polare; STS, superior temporal sulcus.

References

    1. Abdi H. Congruence: congruence coefficient, Rv-coefficient, and Mantel coefficient. In: Salking NJ, Nougherty DM, Frey B, editors. Encyclopedia of research design. Thousand Oaks, CA: Sage; 2010. pp. 222–229.
    1. Aguirre GK. Continuous carry-over designs for fMRI. Neuroimage. 2007;35:1480–1494. doi: 10.1016/j.neuroimage.2007.02.005. - DOI - PMC - PubMed
    1. Ahdesmäki M, Strimmer K. Feature selection in omics prediction problems using cat scores and false nondiscovery rate control. Ann Appl Stat. 2010;4:503–519. doi: 10.1214/09-AOAS277. - DOI
    1. Avants BB, Epstein CL, Grossman M, Gee JC. Symmetric diffeomorphic image registration with cross-correlation: evaluating automated labeling of elderly and neurodegenerative brain. Med Image Anal. 2008;12:26–41. doi: 10.1016/j.media.2007.06.004. - DOI - PMC - PubMed
    1. Bonte M, Hausfeld L, Scharke W, Valente G, Formisano E. Task-dependent decoding of speaker and vowel identity from auditory cortical response patterns. J Neurosci. 2014;34:4548–4557. doi: 10.1523/JNEUROSCI.4339-13.2014. - DOI - PMC - PubMed

LinkOut - more resources