Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012 Sep 6:3:320.
doi: 10.3389/fpsyg.2012.00320. eCollection 2012.

Neural Oscillations Carry Speech Rhythm through to Comprehension

Affiliations

Neural Oscillations Carry Speech Rhythm through to Comprehension

Jonathan E Peelle et al. Front Psychol. .

Abstract

A key feature of speech is the quasi-regular rhythmic information contained in its slow amplitude modulations. In this article we review the information conveyed by speech rhythm, and the role of ongoing brain oscillations in listeners' processing of this content. Our starting point is the fact that speech is inherently temporal, and that rhythmic information conveyed by the amplitude envelope contains important markers for place and manner of articulation, segmental information, and speech rate. Behavioral studies demonstrate that amplitude envelope information is relied upon by listeners and plays a key role in speech intelligibility. Extending behavioral findings, data from neuroimaging - particularly electroencephalography (EEG) and magnetoencephalography (MEG) - point to phase locking by ongoing cortical oscillations to low-frequency information (~4-8 Hz) in the speech envelope. This phase modulation effectively encodes a prediction of when important events (such as stressed syllables) are likely to occur, and acts to increase sensitivity to these relevant acoustic cues. We suggest a framework through which such neural entrainment to speech rhythm can explain effects of speech rate on word and segment perception (i.e., that the perception of phonemes and words in connected speech is influenced by preceding speech rate). Neuroanatomically, acoustic amplitude modulations are processed largely bilaterally in auditory cortex, with intelligible speech resulting in differential recruitment of left-hemisphere regions. Notable among these is lateral anterior temporal cortex, which we propose functions in a domain-general fashion to support ongoing memory and integration of meaningful input. Together, the reviewed evidence suggests that low-frequency oscillations in the acoustic speech signal form the foundation of a rhythmic hierarchy supporting spoken language, mirrored by phase-locked oscillations in the human brain.

Keywords: intelligibility; language; oscillations; phase locking; speech comprehension; speech rate; theta.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Multiple representations of the acoustic and linguistic information in a single spoken sentence. (A) At top is a spectrogram, showing power in different frequency ranges over the course of a sentence. The middle row shows the changes in sound pressure over time, as occur at the tympanic membrane. The bottom row shows the amplitude envelope of the sentence, corresponding approximately to the syllable rate, and created by half-wave rectifying and low-pass filtering the speech signal. (B) Schematic illustration of three different timescales of acoustic and linguistic information contained in the sentence.
Figure 2
Figure 2
Illustration of noise vocoding (after Shannon et al., 1995). (A) The frequency range of a stimulus is divided into a number of frequency channels (in this case, 4), usually logarithmically spaced to approximate cochlear processing. For each channel, the original sound is filtered to retain information in the given frequency range, and the amplitude modulation profile (envelope) is extracted, typically by rectification and filtering (e.g., Shannon et al., 1995) or using a Hilbert transform (e.g., Smith et al., 2002). Each amplitude envelope is used to modulate white noise filtered into the same frequency band. The amplitude-modulated white noise is then combined to form a vocoded stimulus that has significantly reduced spectral detail compared to the original speech. The more channels included in the vocoder, the more spectral detail results leading to more intelligible speech. (B) The overall amplitude envelope of a clear and vocoded sentence are nearly identical. Thus, although vocoded speech can differ markedly in intelligibility from clear speech, it retains the low-frequency amplitude modulations critical for perceiving speech rhythm. (C) Examples of the same sentence vocoded with 16 channels, 4 channels, or 1 channel. Fewer channels result in less spectral detail, as well as lower intelligibility (word report data from Peelle et al., in press).
Figure 3
Figure 3
Schematic illustration of experimental manipulation and findings of Experiment 1 in Dilley and Pitt (2010). Listeners heard a sentence containing an acoustic token of a function word with minimal cues to word boundaries due to co-articulation. For example, in the sentence fragment “Fred would rather have a summer or lake…,” the word “or” is present, but underarticulated. The speech rate of either the target fragment (“summer or l-”) or its context was then selectively manipulated, and the effect of these manipulations on listeners reporting the function word (in this example, “or”) was measured. These conditions are shown along the left side of the figure, along with the acoustic amplitude envelope for each stimulus (colored lines). The authors found that when the speech rate of the context and target word matched, word report for the function word was relatively high; by contrast, when the context was slower than the target word, fewer function words were reported. This result shows how listeners make use of contextual speech rate to guide lexical segmentation during speech comprehension.
Figure 4
Figure 4
(A) Ongoing oscillatory activity determines how efficiently sensory stimuli drive perceptual processes, depending on the phase of oscillation at which they arrive. Information arriving at a low-excitability phase is processed relatively less efficiently, whereas that arriving at a high-excitability phase is processed more efficiently. (B) If sensory information exhibits temporal regularity, overall processing efficiency can be increased by shifting the phase of ongoing neural oscillations to line up with the phase of the stimuli. Top: Repeated stimuli arriving at sub-optimal phases of neural oscillations. Bottom: By shifting the phase of the brain oscillations, stimuli now arrive at a phase during which neurons are in a relatively excitable state and are thus processed more efficiently (i.e., lead to greater numbers of informative spikes).
Figure 5
Figure 5
Illustration of the phase of ongoing neural oscillations being reset by an external stimulus. Prior to the stimulus event the phases of the oscillations are random, but following the stimulus they are aligned.
Figure 6
Figure 6
Hypothesized contribution of entrained neural oscillations to categorical phoneme judgments based on voice onset time (VOT) in connected speech. (A) Spectrograms of two non-words, “pife” and “bife.” The amplitude envelope for each is overlaid in black. (B) At top, schematic illustrations of three phonetic tokens (/pa/, /ba/, and an ambiguous intermediate token, /?a/) that differ in VOT. Neural oscillations entrained to two different speech rates are shown below, here for the short carrier phrase “I say̲̲̲.” For both speech rates, the aspiration for a clear /pa/ occurs in a region of low excitability of the entrained oscillation (formula image), and the aspiration for a clear /ba/ in a region of high excitability (formula image). However, for the ambiguous token, the aspiration occurs at different levels of excitability for the faster and slower speech rates (formula image), making it less likely to be perceived as /pa/ (and more likely to be perceived as a /ba/) at slower speech rates. (C) Schematic categorical perception curves demonstrating a shift of perceptual boundaries as a function of speech rate based on this framework.
Figure 7
Figure 7
(A) Brain regions responding to amplitude-modulated acoustic stimulation. (B) Brain regions responding to intelligible speech > temporally complex control conditions. Intelligible speech – i.e., amplitude-modulated acoustic stimulation that conveys linguistic information – recruits a broad network of bilateral cortical regions that are organized into parallel hierarchies. Within this network, regions show differential sensitivity to the surface features of speech (i.e., acoustic information), with areas removed from primary auditory cortex primarily responding to the degree of linguistic information in the speech (Davis and Johnsrude, 2003).
Figure 8
Figure 8
Schematic illustration of peaks from various fMRI studies in which activation of left anterior lateral temporal cortex was observed, overlaid on the region showing sensitivity to speech intelligibility from Scott et al. (2000).

Similar articles

Cited by

References

    1. Abercrombie D. (1967). Elements of General Phonetics. Chicago: Aldine
    1. Ahissar E., Nagarajan S., Ahissar M., Protopapas A., Mahncke H., Merzenich M. M. (2001). Speech comprehension is correlated with temporal response patterns recorded from auditory cortex. Proc. Natl. Acad. Sci. U.S.A. 98, 13367–1337210.1073/pnas.221461598 - DOI - PMC - PubMed
    1. Arnal L. H., Giraud A.-L. (2012). Cortical oscillations and sensory predictions. Trends Cogn. Sci. 16, 390–39810.1016/j.tics.2012.05.003 - DOI - PubMed
    1. Baer T., Moore B. C. J. (1993). Effects of spectral smearing on the intelligibility of sentences in noise. J. Acoust. Soc. Am. 94, 1229–124110.1121/1.408176 - DOI - PubMed
    1. Belin P., Zilbovicius M., Crozier S., Thivard L., Fontaine A., Masure M.-C., Samson Y. (1998). Lateralization of speech and auditory temporal processing. J. Cogn. Neurosci. 10, 536–54010.1162/089892998562834 - DOI - PubMed

LinkOut - more resources