Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Jul 20;4(3):420-434.
doi: 10.1162/nol_a_00108. eCollection 2023.

Evidence for a Spoken Word Lexicon in the Auditory Ventral Stream

Affiliations

Evidence for a Spoken Word Lexicon in the Auditory Ventral Stream

Srikanth R Damera et al. Neurobiol Lang (Camb). .

Abstract

The existence of a neural representation for whole words (i.e., a lexicon) is a common feature of many models of speech processing. Prior studies have provided evidence for a visual lexicon containing representations of whole written words in an area of the ventral visual stream known as the visual word form area. Similar experimental support for an auditory lexicon containing representations of spoken words has yet to be shown. Using functional magnetic resonance imaging rapid adaptation techniques, we provide evidence for an auditory lexicon in the auditory word form area in the human left anterior superior temporal gyrus that contains representations highly selective for individual spoken words. Furthermore, we show that familiarization with novel auditory words sharpens the selectivity of their representations in the auditory word form area. These findings reveal strong parallels in how the brain represents written and spoken words, showing convergent processing strategies across modalities in the visual and auditory ventral streams.

Keywords: auditory lexicon; auditory ventral stream; speech recognition; superior temporal gyrus.

PubMed Disclaimer

Figures

<b>Figure 1.</b>
Figure 1.
Rapid adaptation and auditory localizer experimental paradigms. (A) The slow clustered acquisition paradigm used in the auditory localizer scan. Each trial was 9 s long with 1.5 s of volume acquisition and 7.5 s of silence. During the silent period, the subject heard four sounds from one of five stimulus classes and performed a 1-back task. (B) The rapid clustered acquisition paradigm used for the RA scans (Chevillet et al., 2013; Jiang et al., 2018). Each trial was 3.36 s long with 1.68 s of volume acquisition. During the silent period, two spoken words were played to the subject with a 50 ms interstimulus interval. The first word acted as a prime and the second word the target. The experimental paradigms for (C) real words and (D) pseudowords. The prime was followed by a target word that was either the same word (SAME), a word that differed from the target by one phoneme (1PH), or a word that shared no phonemes with the target (DIFF). Furthermore, subjects were presented with silence trials that served as an explicit baseline. During the task, subjects were asked to attend to all the words and respond when they heard the oddball stimulus (RW or PW containing the rhyme “-ox,” e.g., “socks”) in either the prime or target position.
<b>Figure 2.</b>
Figure 2.
Identifying the auditory word form area (AWFA). (A) Proposed location of the AWFA (MNI: −61, −15, −5). Adapted from DeWitt and Rauschecker (2012). Color bar (arbitrary units) reflects the activation likelihood estimatoin (Laird et al., 2005) statistic. (B) The RW vs. Silence contrast (p < 0.001) masked by the RW vs. Scrambled Real Words contrast (p < 0.05) in the auditory localizer scan. Only clusters significant at the FDR p < 0.05 level are shown. Colors reflect t statistics. (C) The peak in the left STG (MNI: −62, −14, 2). (D) The AWFA defined in individual subjects. The inset zooms in on the perisylvian region to highlight the location of the AWFA.
<b>Figure 3.</b>
Figure 3.
Evidence for auditory lexical representations in the auditory word form area. Within-subject (n = 24) adaptation profile for auditory real words (RWs) and untrained pseudowords (UTPWs). Patterns of release from adaptation are compatible with tight tuning to individual RWs consistent with an auditory lexicon. In contrast, UTPWs show a graded release from adaptation as a function of phonological similarity. Horizontal black line in violin plots indicates the median. ***, **, *, and N.S. mark p < 0.001, <0.01, <0.05, and not significant (>0.1), all Bonferroni-corrected for multiple comparisons.
<b>Figure 4.</b>
Figure 4.
Auditory lexical representations emerge in the auditory word form area (AWFA) for pseudowords after familiarization training. Within-subject (n = 16) adaptation profile for auditory real words (RW), untrained pseudowords (UTPW), and trained pseudowords (TPW). RW adaptation profile shows tuning to individual RWs consistent with an auditory lexicon. UTPWs show a graded release from adaptation as a function of phonological similarity. Importantly, following familiarization training, adaptation patterns in the AWFA to the same pseudowords (now TPW) reveal tight lexical tuning, similar to RW. Horizontal black line in violin plots indicates the median. ***, **, *, and N.S. mark p < 0.001, <0.01, <0.05, and not significant (>0.1), all Bonferroni-corrected.
<b>Figure 5.</b>
Figure 5.
Functional connectivity of the auditory word form area (AWFA). Whole-brain functional connectivity of the AWFA during the auditory localizer task (n = 26). Results are thresholded at a voxel-wise p < 0.001 and cluster-level p < 0.05, family-wise error corrected. Cluster corresponding to the literature coordinates of the visual word form area is circled in red. Color bar represents t statistic.

References

    1. Archakov, D., DeWitt, I., Kuśmierek, P., Ortiz-Rios, M., Cameron, D., Cui, D., Morin, E. L., VanMeter, J. W., Sams, M., Jääskeläinen, I. P., & Rauschecker, J. P. (2020). Auditory representation of learned sound sequences in motor regions of the macaque brain. Proceedings of the National Academy of Sciences, 117(26), 15242–15252. 10.1073/pnas.1915610117, - DOI - PMC - PubMed
    1. Ashburner, J., Barnes, G., Chen, C.-C., Daunizeau, J., Flandin, G., Friston, K., Gitelman, D., Glauche, V., Henson, R., Hutton, C., Jafarian, A., Kiebel, S., Kilner, J., Litvak, V., Mattout, J., Moran, R., Penny, W., Phillips, C., Razi, A., … Zeidman, P. (2021). SPM12 manual. Wellcome Centre for Human Neuroimaging.
    1. Balota, D. A., Yap, M. J., Cortese, M. J., Hutchison, K. A., Kessler, B., Loftis, B., Neely, J. H., Nelson, D. L., Simpson, G. B., & Treiman, R. (2007). The English Lexicon Project. Behavior Research Methods, 39(3), 445–459. 10.3758/BF03193014, - DOI - PubMed
    1. Behzadi, Y., Restom, K., Liau, J., & Liu, T. T. (2007). A component based noise correction method (CompCor) for BOLD and perfusion based fMRI. NeuroImage, 37(1), 90–101. 10.1016/j.neuroimage.2007.04.042, - DOI - PMC - PubMed
    1. Belin, P., Zatorre, R. J., Lafaille, P., Ahad, P., & Pike, B. (2000). Voice-selective areas in human auditory cortex. Nature, 403(6767), 309–312. 10.1038/35002078, - DOI - PubMed

LinkOut - more resources