Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Feb 1;53(2):195-209.
doi: 10.1016/j.specom.2010.09.001.

Perception of Place of Articulation for Plosives and Fricatives in Noise

Affiliations

Perception of Place of Articulation for Plosives and Fricatives in Noise

Abeer Alwan et al. Speech Commun. .

Abstract

This study aims at uncovering perceptually-relevant acoustic cues for the labial versus alveolar place of articulation distinction in syllable-initial plosives {/b/,/d/,/p/,/t/} and fricatives {/f/,/s/,/v/,/z/} in noise. Speech materials consisted of naturally-spoken consonant-vowel (CV) syllables from four talkers where the vowel was one of {/a/,/i/,/u/}. Acoustic analyses using logistic regression show that formant frequency measurements, relative spectral amplitude measurements, and burst/noise durations are generally reliable cues for labial/alveolar classification. In a subsequent perceptual experiment, each pair of syllables with the labial/alveolar distinction (e.g., /ba,da/) was presented to listeners in various levels of signal-to-noise-ratio (SNR) in a 2-AFC task. A threshold SNR was obtained for each syllable pair using sigmoid fitting of the percent correct scores. Results show that the perception of the labial/alveolar distinction in noise depends on the manner of articulation, the vowel context, and interaction between voicing and manner of articulation. Correlation analyses of the acoustic measurements and threshold SNRs show that formant frequency measurements (such as F1 and F2 onset frequencies and F2 and F3 frequency changes) become increasingly important for the perception of labial/alveolar distinctions as the SNR degrades.

PubMed Disclaimer

Figures

Figure 1
Figure 1
(a) LPC spectrum of a /ta/ token during the vowel, (b) DFT spectrum of a /ta/ token during the burst, (c) formant transition measurements, and (d) illustration of the determination of formant transition offset (in this case, F1 frequencies obtained using LPC analyses) when the change in frequency drops below 5 Hz per 2.5 ms.
Figure 2
Figure 2
Histograms of F2 frequency change (F2df) for the 12 labial/alveolar pairs with the labial and alveolar tokens counted separately. The histogram bin centers ranges from −820 to 540 Hz with a 80 Hz step. F2df of less than −860 Hz and of more than 580 Hz is counted into the −820 Hz center and 540 Hz center regions, respectively. F2df was a reliable cue for the vowel /a/ pairs except for /pa,ta/. Asterisks are added next to the CV pair name to indicate 79% or above correct classification of place of articulation.
Figure 3
Figure 3
Histograms of Av4-pA45 for the 12 labial/alveolar pairs with the labial and alveolar tokens counted separately. The histogram bin centers ranges from −55 to 55 dB with a 5 dB step. Av4-pA45 of less than −57.5 dB and of more than 57.5 dB is counted into the −55 dB center and 55 dB center regions, respectively. Av4-pA45 was distinctive for /ba,da/, /pi,ti/, /pu,tu/, /vu,zu/, /fi,si/, and /fu,su/. Asterisks are added next to the CV pair name to indicate 79% or above correct classification of place of articulation.
Figure 4
Figure 4
A sigmoid fitting (solid line) of percent correct scores as a function of SNR (dB) for the 12 labial/alveolar pairs. For each pair, the 79% threshold line is drawn, and the threshold SNR value is labeled. The average percent correct scores (from the four listeners) are in circles. The error bars represent the minimum and maximum numbers among the four listeners.
Figure 5
Figure 5
Correlation coefficients between threshold SNRs and acoustic measures (distances between means) across all talkers as a function of the threshold percent correct (71%-84%). Acoustic measures that produced negative correlations were not displayed.

References

    1. Alwan A. The role of F3 and F4 in identifying the place of articulation for stop consonants; Proceedings of the International Conference on Spoken Language Processing; Banff, Canada; 1992. pp. 1063–1066.
    1. Behrens S, Blumstein SE. On the role of the amplitude of the fricative noise in the perception of place of articulation in voiceless fricative consonants. J. Acoust. Soc. Am. 1988;84(3):861–867. - PubMed
    1. Benkí JR. Quantitative evaluation of lexical status, word frequency, and neighborhood density as context effects in spoken word recognition. J. Acoust. Soc. Am. 2003;113(3):1689–1705. - PubMed
    1. Blumstein SE, Stevens KN. Acoustic invariance in speech production: Evidence from measurements of the spectral characteristics of stop consonants. J. Acoust. Soc. Am. 1979;66(4):1001–1017. - PubMed
    1. Bradlow AR, Alexander JA. Semantic and phonetic enhancements for speech-in-noise recognition by native and non-native listeners. J. Acoust. Soc. Am. 2007;121(4):2339–2349. - PubMed