Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2008 Mar 12;363(1493):947-63.
doi: 10.1098/rstb.2007.2152.

Basic auditory processes involved in the analysis of speech sounds

Affiliations
Review

Basic auditory processes involved in the analysis of speech sounds

Brian C J Moore. Philos Trans R Soc Lond B Biol Sci. .

Abstract

This paper reviews the basic aspects of auditory processing that play a role in the perception of speech. The frequency selectivity of the auditory system, as measured using masking experiments, is described and used to derive the internal representation of the spectrum (the excitation pattern) of speech sounds. The perception of timbre and distinctions in quality between vowels are related to both static and dynamic aspects of the spectra of sounds. The perception of pitch and its role in speech perception are described. Measures of the temporal resolution of the auditory system are described and a model of temporal resolution based on a sliding temporal integrator is outlined. The combined effects of frequency and temporal resolution can be modelled by calculation of the spectro-temporal excitation pattern, which gives good insight into the internal representation of speech sounds. For speech presented in quiet, the resolution of the auditory system in frequency and time usually markedly exceeds the resolution necessary for the identification or discrimination of speech sounds, which partly accounts for the robust nature of speech perception. However, for people with impaired hearing, speech perception is often much less robust.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Psychophysical tuning curves (PTCs) determined in simultaneous masking, using sinusoidal signals at 10 dB SL. For each curve, the solid circle below it indicates the frequency and level of the signal. The masker was a sinusoid which had a fixed starting phase relationship with the 50 ms signal. The masker level required for threshold is plotted as a function of masker frequency on a logarithmic scale. The dashed line shows the absolute threshold for the signal. Data from Vogten (1978).
Figure 2
Figure 2
Schematic illustration of the technique used by Patterson (1976) to determine the shape of the auditory filter. The threshold of the sinusoidal signal (indicated by the bold vertical line) is measured as a function of the width of a spectral notch in the noise masker. The amount of noise passing through the auditory filter centred at the signal frequency is proportional to the shaded areas.
Figure 3
Figure 3
A typical auditory filter shape determined using the notched-noise method. The filter is centred at 1 kHz. The relative response of the filter (in decibels) is plotted as a function of frequency.
Figure 4
Figure 4
Masking patterns for a narrowband noise masker centred at 410 Hz. Each curve shows the elevation in threshold of a pure-tone signal as a function of signal frequency. The overall noise level in dB SPL for each curve is indicated in the figure. Data from Egan & Hake (1950).
Figure 5
Figure 5
Excitation patterns for a 1000 Hz sinusoid at levels ranging from 20 to 90 dB SPL in 10 dB steps.
Figure 6
Figure 6
(a) The spectrum of a synthetic vowel /I/ plotted on a linear frequency scale. (b) The same spectrum plotted on an ERBN-number scale. (c) The excitation pattern for the vowel plotted on an ERBN-number scale.
Figure 7
Figure 7
The points labelled ‘R’ are thresholds for detecting a 1 kHz signal centred in a band of random noise, plotted as a function of the bandwidth of the noise. The points labelled ‘M’ are the thresholds obtained when the noise was amplitude modulated at an irregular, low rate. Reproduced with permission from Hall et al. (1984) and J. Acous. Soc. Am.
Figure 8
Figure 8
Excitation patterns for three vowels, /i/, /a/ and /u/, plotted on an ERBN-number scale.
Figure 9
Figure 9
Illustration of the filters used by Watkins & Makin (1996a). (a,b) ‘Filters’ corresponding to the spectral envelopes of the vowels ‘/ε/’ and ‘/I/’, respectively. (c) Filter corresponding to the difference between the spectral envelopes of the vowels ‘/ε/’ and ‘/I/’.
Figure 10
Figure 10
A temporal modulation transfer function (TMTF). A broadband white noise was sinusoidally amplitude modulated, and the threshold amount of modulation required for detection is plotted as a function of modulation rate. The amount of modulation is specified as 20 log m, where m is the modulation index. The higher the sensitivity to modulation, the more negative is this quantity. Data from Bacon & Viemeister (1985).
Figure 11
Figure 11
Spectro-temporal excitation pattern (STEP) of the word ‘tips’. The figure was produced by Prof. C. J. Plack. Adapted from Moore (2003c).

Similar articles

Cited by

References

    1. Aibara R, Welsh J.T, Puria S, Goode R.L. Human middle-ear sound transfer function and cochlear input impedance. Hear. Res. 2001;152:100–109. doi:10.1016/S0378-5955(00)00240-9 - DOI - PubMed
    1. Alcántara J.I, Moore B.C.J, Vickers D.A. The relative role of beats and combination tones in determining the shapes of masking patterns at 2 kHz: I. Normal-hearing listeners. Hear. Res. 2000;148:63–73. doi:10.1016/S0378-5955(00)00114-3 - DOI - PubMed
    1. ANSI. American National Standards Institute; New York, NY: 1994. ANSI S1.1-1994. American national standard acoustical terminology.
    1. Bacon S.P, Viemeister N.F. Temporal modulation transfer functions in normal-hearing and hearing-impaired subjects. Audiology. 1985;24:117–134. - PubMed
    1. Brungart D.S, Simpson B.D, Darwin C.J, Arbogast T.L, Kidd G., Jr Across-ear interference from parametrically degraded synthetic speech signals in a dichotic cocktail-party listening task. J. Acoust. Soc. Am. 2005;117:292–304. doi:10.1121/1.1835509 - DOI - PubMed

LinkOut - more resources