Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 May;133(5):2953-71.
doi: 10.1121/1.4796111.

Toward a quantitative account of pitch distribution in spontaneous narrative: method and validation

Affiliations

Toward a quantitative account of pitch distribution in spontaneous narrative: method and validation

Samuel E Matteson et al. J Acoust Soc Am. 2013 May.

Abstract

Pitch is well-known both to animate human discourse and to convey meaning in communication. The study of the statistical population distributions of pitch in discourse will undoubtedly benefit from methodological improvements. The current investigation examines a method that parameterizes pitch in discourse as musical pitch interval H measured in units of cents and that disaggregates the sequence of peak word-pitches using tools employed in time-series analysis and digital signal processing. The investigators test the proposed methodology by its application to distributions in pitch interval of the peak word-pitch (collectively called the discourse gamut) that occur in simulated and actual spontaneous emotive narratives obtained from 17 middle-aged African-American adults. The analysis, in rigorous tests, not only faithfully reproduced simulated distributions imbedded in realistic time series that drift and include pitch breaks, but the protocol also reveals that the empirical distributions exhibit a common hidden structure when normalized to a slowly varying mode (called the gamut root) of their respective probability density functions. Quantitative differences between narratives reveal the speakers' relative propensity for the use of pitch levels corresponding to elevated degrees of a discourse gamut (the "e-la") superimposed upon a continuum that conforms systematically to an asymmetric Laplace distribution.

PubMed Disclaimer

Figures

Figure 1
Figure 1
PDFs for the peak word-pitch occurring in a discourse narrated by a male (lower frequency, mel and pitch) and female (higher frequency, mel and pitch) parameterized as fundamental frequency F0 and mel (a) and as pitch interval H (b). In the pitch parameterization, the distributions appear more similar in shape and width than in terms of the fundamental frequency.
Figure 2
Figure 2
The PDFs of the distributions appearing in Fig. 1, normalized to their respective modal pitches, that is, gamut root. Note that in the range of the fundamental frequencies of adult speakers, the mel scale is very nearly a linear mapping of frequency, that is, mel ≈ 1.14(F0 + 34 Hz).
Figure 3
Figure 3
Simulated time series of peak word-pitch versus word sequence number. The simulated discourse pitch distribution consists of three artificially produced populations (with high, middle, and low centroids) with relative statistical frequencies of 1:2:4, respectively superimposed on a trend with a single shift near the mid-point of the time series (solid line). In addition, randomly placed outliers have been introduced to test the robustness of the outlier detection protocol.
Figure 4
Figure 4
(Color online) Histogram of simulated distribution of peak word-pitch time series shown in Fig. 3 normalized to the mode of the distribution of the whole time series. The histogram data do not well reproduce the original distributions when the analysis does not take into account shifts in the modal pitch, the gamut root.
Figure 5
Figure 5
(Color online) PDF of simulated peak word-pitch time series of Fig. 3 with the original distribution superimposed (dashed curve). The agreement between the strength, width, shape, and position of the extracted peaks and the simulation is excellent; the data show less than a 1% change in the standard deviation of the peaks and a small error in peak position (−28¢ absolute and less than −11¢ relative to the modal peak), a value that is well within the estimated standard error of the measurement. Note the preservation of the distribution of outliers introduced in the simulation.
Figure 6
Figure 6
Long time series for subject A with a significant number of outliers (open circles). The gamut root shown as the double line is computed from a moving average of the points in a moving window that have a modified z-score less than or equal to the critical value (zcritical = 0.89), a value that is unique to this discourse.
Figure 7
Figure 7
Comparison of two peak word-pitch time series extracts illustrating the presence of outliers and gamut root variability. The narrative (a) above required a critical value of the modified z-score of 0.87 while that below (b) had a zcritical = 1.1. Outliers are identified as open circles and were omitted in the computation of the trend and gamut root.
Figure 8
Figure 8
(a) FFT amplitude versus intonation feature length for a sample of 1024 words in the time series of Fig. 6. Solid line is power law fit (power = 0.43). (b) Empirical Boxcar filter transfer function obtained from a ratio of oscillator strength in window-averaged (Boxcar filtered) FFT plot to the original oscillator strength (solid line) versus feature length. Also shown is the transfer function for the residual peak word-pitch time series.
Figure 9
Figure 9
The relative stationary peak word-pitch time series (subject F) exhibiting declination and reset. Declination features are marked by descending arrows, while resets are indicated by upward-pointing broad arrowheads.
Figure 10
Figure 10
A relative stationary peak word-pitch time series (subject J) demonstrating features that are the inverse of declination, so-called aclination features with reset. Aclination features are indicated by rising arrows while resets are marked by downward pointing broad arrowheads.
Figure 11
Figure 11
Representative PDFs for narrators A, B, and Q with the fraction of the fit due to the ADE of 1.0, 0.95, and 0.45, respectively, show in the data as well as the fit.
Figure 12
Figure 12
Composite of all PDF distribution fits for all subjects A to Q arranged in descending order of continuum contributions from 100% (A) to 45% (Q). Note that the seemingly diverse ensemble is accommodated by a sum of the two fitting functions.
Figure 13
Figure 13
Composite (sum) of the GMM component of all the distributions compared to degrees of the standard musical scale (bottom). Longer vertical markers indicate the “white” keys and the shorter the “black” keys in a chromatic scale. Some of the peaks of the distribution do coincide with musical degrees, but only incidentally, suggesting no statistical significance. The first order, second order, and third order thresholds for contrast are noted by horizontal bars.

Similar articles

Cited by

References

    1. Abberton, E., and Fourcin, A. J. (1978). “ Intonation and speaker identification,” Lang Speech 21(4 ), 305–318. - PubMed
    1. Askenfelt, A. (1973). “ Determination of difference limen at low frequencies,” in STL-QPSR Speech Transmission Laboratory, Quarterly Progress and Status Report 14 (Royal Institute of Technology KTH, Stockholm: ), pp. 36–39.
    1. Bachorowski, J., and Owren, M. J. (2008). “ Vocal expressions of emotion,” in Handbook of Emotions, 3rd ed., edited by Lewis M., Haviland-Jones J. M., and Barrett L. F. (Guilford Press, New York: ), pp. 196–210.
    1. Bailey, G. (2001). “ The relationship between African American Vernacular English and White Vernaculars in the American South: A sociocultural history and some phonological evidence,” in Sociocultural and Historical Contexts of African American English, edited by Lanehart S. L. (John Benjamins, Amsterdam: ), pp. 53–92.
    1. Beranek, L. L. (1949). Acoustic Measurements (McGraw-Hill, New York: ), p. 523.

Publication types

LinkOut - more resources