Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Jun 15;111(12):2433-44.
doi: 10.1152/jn.00497.2013. Epub 2014 Mar 19.

Temporal scaling of neural responses to compressed and dilated natural speech

Affiliations

Temporal scaling of neural responses to compressed and dilated natural speech

Y Lerner et al. J Neurophysiol. .

Abstract

Different brain areas integrate information over different timescales, and this capacity to accumulate information increases from early sensory areas to higher order perceptual and cognitive areas. It is currently unknown whether the timescale capacity of each brain area is fixed or whether it adaptively rescales depending on the rate at which information arrives from the world. Here, using functional MRI, we measured brain responses to an auditory narrative presented at different rates. We asked whether neural responses to slowed (speeded) versions of the narrative could be compressed (stretched) to match neural responses to the original narrative. Temporal rescaling was observed in early auditory regions (which accumulate information over short timescales) as well as linguistic and extra-linguistic brain areas (which can accumulate information over long timescales). The temporal rescaling phenomenon started to break down for stimuli presented at double speed, and intelligibility was also impaired for these stimuli. These data suggest that 1) the rate of neural information processing can be rescaled according to the rate of incoming information, both in early sensory regions as well as in higher order cortexes, and 2) the rescaling of neural dynamics is confined to a range of rates that match the range of behavioral performance.

Keywords: fMRI; real-life auditory stimuli; slow and fast rates of speech; speed of information processing.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Identification of regions with short and long temporal receptive windows (TRWs). A: map of TRWs, in which each voxel is colored according to the level of coherent temporal structure that was required to produce significant response reliability [intersubject correlation (inter-SC)] in that voxel. Auditory narratives of varying temporal coherence were presented: the backward stimulus has the least temporal coherence, while the word scramble, sentence scramble, and paragraph scramble stimuli contain increasing levels of temporally coherent information. Voxels are marked according to the least coherent stimulus that could drive reliable responses in that voxel. Thus voxels exhibiting reliable responses to all stimuli (including “backward”) are labeled backward and colored red (see A1+). Voxels that exhibited reliable responses to all stimuli except the backward stimulus are labeled “word scram” and colored yellow [see medial superior temporal gyrus (mSTG)]. Similarly, voxels that were reliable only for the “sentence scramble,” “paragraph scramble,” and the “intact” stimulus are labeled “sentence scram,” and they are colored green [see posterior STG (pSTG)]. Finally, voxels were assigned the label “paragraph scram” and colored blue if they responded reliably only to the paragraph scramble stimulus and the intact stimulus [see temporo-parietal junction (TPJ)]. B: reliability profiles for ROIs defined along the A1-TPJ axis. Early auditory areas (A1+) responded reliably to all conditions regardless of the level of temporal scrambling, and thus they have short TRWs. Areas adjacent to A1+ along the superior temporal gyrus exhibited an intermediate TRW. Here, coherent information at the “words” (mSTG), “sentences” (pSTG), or longer timescales was necessary to elicit reliable response time courses. The longest TRWs among the regions of interest (ROIs) were found in the TPJ and angular gyrus (AG). In these regions, reliable responses were evoked only by the paragraph scramble and intact stimuli.
Fig. 2.
Fig. 2.
Examples of stimuli. A schematic representation of the stimuli used in the experiment. The rate of a real life story (100%) was speeded in the time [75% duration and 50% duration (double speeded)] and slowed in time [150% duration and 200% duration (twice as slow)]. To generate speeded and slowed versions of the story, an algorithm was employed that preserved the fine frequency structure of the auditory waveform while compressing or dilating its energy envelope.
Fig. 3.
Fig. 3.
Behavioral performance. Averaged recognition performance is shown for each condition. A: number of correctly recognized words. The y-axis shows the average percentage of correct words recognition across all sentences. B: level of sentence comprehension. The y-axis shows the semantic comprehension of a sentence. Error bars indicate ± SD. *,**Significance difference between the condition and the intact (100%) story. Note that there was a reduction in the intelligibility of the 50% duration condition across the 2 behavioral estimates. C: histogram of syllables rates across sentences for the uncompressed story (100%), the fastest (50%), and the slowest (200%) conditions.
Fig. 4.
Fig. 4.
Reliability of responses for each presentation rate. Maps of the reliability of responses across subjects (n = 15) were computed separately for each stimulus condition and superimposed on an inflated brain shown in lateral and medial views. The maps illustrate the extent of the inter-SC for the 50% duration (A), 75% duration (B), intact story (C; 100%), 150% duration (D), and 200% duration (E). LS, lateral sulcus; CS, central sulcus; IPS, intraparietal sulcus; mPFC, medial prefrontal cortex.
Fig. 5.
Fig. 5.
Simulation pipeline and results. A: simulation is founded on strong agreement between 1) the downsampled and HRF-convolved intracranial-EEG (iEEG) power recorded from an electrode at the lateral Sylvian fissure in an individual epileptic patient listening to the 100% condition, and 2) averaged blood-oxygenation level-dependent (BOLD) responses sampled from the same area in healthy individuals hearing the 100% condition. The strong agreement between the signals is striking given that the BOLD and iEEG signals were acquired from different subjects and different experimental setups. B: modeling steps: the electrophysiological response was downsampled (or upsampled) to mimic a linear scaling of the neural signal in response to differing stimuli (B1). Spectrum-matched noise was added to the signals to produce simulated BOLD correlation values which match those obtained empirically (B2). Noisy rescaled iEEG signals were then convolved with an HRF (B3). Next, the compressed (or dilated) BOLD response time courses were upsampled (or downsampled) to the same time-base as the BOLD responses to the 100% duration story (B4). C: cross-correlation of the HRF-convolved speeded (slowed) iEEG signals and the HRF-convolved original iEEG signal, after up (down) sampling. The results of the simulation procedure indicate that in cases where the underlying neural responses are linearly scaled, the upsampled and downsampled BOLD signals will be highly correlated with the convolved original signal but with a phase lag.
Fig. 6.
Fig. 6.
Response reliability in the TPJ for different presentation rates. A: raw responses in the TPJ averaged across all participants in each of the five conditions. B: dilated of the speeded 50 and 75% duration responses; and compressed of the 150 and 200% duration responses. Note that after accounting for linear response scaling, the responses to the compressed, dilated, and intact stimuli were highly correlated within the TPJ. C: correlation of the individual time courses to the intact (100%) story in an independent group of listeners (n = 11) and the averaged time courses to the speeded, slowed, and intact conditions (n = 15).
Fig. 7.
Fig. 7.
Topography of rate modification effects across conditions. Maps of the reliability of responses were computed by comparing responses in the scaled speeded and slowed conditions with the responses to the intact story in the previous study (Lerner et al. 2011). The extent of the inter-SC is presented for 2 speeded conditions: 50% duration (A) and 75% duration (B), and 2 slowed conditions: 150% duration (D) and 200% duration (E). C: illustrates the inter-SC between responses to the intact 100% story in the previous and current studies. Note that the scaled responses in the slowed and speeded conditions were similar to the responses in the intact condition in all areas and all conditions beside the 50% duration condition in which we observed a weaker agreement. The figure layout is identical to that in Fig. 4.
Fig. 8.
Fig. 8.
Response reliability profiles for ROIs across the processing hierarchy. A: correlation of the individual responses to the intact (100%) story in an independent group of listeners (n = 11) and the averaged responses to the speeded, slowed, and intact conditions (n = 15) are computed. The 75% upsampled and the 150 and 200% downsampled time courses were highly correlated with the intact (100%) response time courses in all ROIs. However, the correlation of the 50% upsampled time course and the intact signal was weaker. B: correlation of the individual responses to the intact (100%) story and averaged unscrambled conditions (“reverse backward,” “unscrambled paragraphs,” and “unParag”) in Lerner et al. (2011). To measure the timescale of an area we scrambled a natural speech at the “words” timescale (0.7 ± 0.5 s), “sentences” timescale (7.7 ± 3.5 s), and “paragraphs” timescale (38.1 ± 17.6 s), and then used a novel neural unscrambling procedure. In the procedure we unscrambled (reordered) the neural responses to each individual sentence (paragraph) that was presented in the scrambled sentence (paragraph) condition and we compared the unscrambled responses to the responses in the intact (100%) condition. In addition, we compared the responses to the intact story with time-reversed responses to the “backward” story (“reverse backward”). The unscrambling procedure revealed that the timescale of processing gradually increases from early sensory areas to higher order perceptual and cognitive areas. Early sensory cortices such as the primary auditory cortex (A1+) have relatively short TRWs (up to hundreds of milliseconds) and therefore respond similarly to each unit of information regardless of the scrambling level. In contrast, high order areas such as the TPJ responded differently to the same set of sentences or paragraphs when they were presented in a different order, as one would expect for an area in which the momentary response is dependent on the stimulus history over long timescales. Areas at intermediate levels of the hierarchy (e.g., pSTG) exhibited a response profile consistent with a temporal integration window of intermediate length (Lerner et al. 2011). *P < 0.05; **P < 0.005; ***P < 0.0005.

References

    1. Adank P, Devlin JT. On-line plasticity in spoken sentence comprehension: adapting to time-compressed speech. Neuroimage 49: 1124–1132, 2010 - PMC - PubMed
    1. Ahissar E, Nagarajan S, Ahissar M, Protopapas A, Mahncke H, Merzenich MM. Speech comprehension is correlated with temporal response patterns recorded from auditory cortex. Proc Natl Acad Sci USA 98: 13367–13372, 2001 - PMC - PubMed
    1. Beasley DS, Bratt GW, Rintelmann WF. Intelligibility of time-compressed sentential stimuli. J Speech Hear Res 23: 722–731, 1980 - PubMed
    1. Ben-Yakov A, Honey CJ, Lerner Y, Hasson U. Loss of reliable temporal structure in event-related averaging of naturalistic stimuli. Neuroimage 63: 501–506, 2012 - PMC - PubMed
    1. Benjamini Y, Hochberg Y. Controlling the false discovery rate–a practical and powerful approach to multiple testing. J Roy Stat Soc B Met 57: 289–300, 1995

Publication types

LinkOut - more resources