Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2007 Jun 5;104(23):9852-7.
doi: 10.1073/pnas.0703140104. Epub 2007 May 24.

Musical intervals in speech

Affiliations
Comparative Study

Musical intervals in speech

Deborah Ross et al. Proc Natl Acad Sci U S A. .

Abstract

Throughout history and across cultures, humans have created music using pitch intervals that divide octaves into the 12 tones of the chromatic scale. Why these specific intervals in music are preferred, however, is not known. In the present study, we analyzed a database of individually spoken English vowel phones to examine the hypothesis that musical intervals arise from the relationships of the formants in speech spectra that determine the perceptions of distinct vowels. Expressed as ratios, the frequency relationships of the first two formants in vowel phones represent all 12 intervals of the chromatic scale. Were the formants to fall outside the ranges found in the human voice, their relationships would generate either a less complete or a more dilute representation of these specific intervals. These results imply that human preference for the intervals of the chromatic scale arises from experience with the way speech formants modulate laryngeal harmonics to create different phonemes.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Fig. 1.
Fig. 1.
Ranges of the peak harmonic in the first two formants (F1 and F2) for eight American English vowels uttered as single words in an emotionally neutral manner. (A) Diagram of the human larynx and vocal tract; see Introduction for explanation. (B) Distribution of the peak harmonics selected as the index for the first and second formant for the five male participants. (C) Distribution for the five female participants. The somewhat smaller harmonic ranges for females are due to the higher average fundamental frequency of female speech. The mean fundamental frequency for male speakers was 109 Hz (SD = 10) and for female speakers 171 Hz (SD = 20) (the diagram in A is adapted from ref. 6).
Fig. 2.
Fig. 2.
Spectra of three different vowels uttered by a representative male speaker (the vowels are indicated in International Phonetic Alphabet nomenclature and phonetically). The repeating intensity peaks are the harmonics created by the varying energy in the air stream resulting from vibrations of the vocal folds (see Fig. 1A); the first peak indicates the fundamental frequency (F0). As in an ideal harmonic series, the intensity of successively higher harmonics tends to fall off exponentially; however, the resonances of the vocal tract above the larynx suppress some laryngeal harmonics more than others, thus creating the formant peaks. This differential suppression of the intensity in the air stream as a function of the configuration of the vocal tract generates the different vowel phones shown. The harmonic peaks of the first two formants are indicated by F1 and F2; asterisks are the formant values given by the linear predictive coding algorithm in Praat. (Insets) Keyboards showing that the intensity peaks in the first two formants often define musical intervals. Red keys indicate F1 and F2 values.
Fig. 3.
Fig. 3.
Ratio relationships between the peak intensity of the first and second formants (see Fig. 2) for the eight vowels tested, compiled for the native English-speakers in the study. All 12 intervals of the chromatic scale in just intonation are represented (red bars); black bars show the frequency of occurrence of interval ratios that do not fall on chromatic scale tones. Sixty-eight percent of the occurrences are chromatic intervals (see SI Text for further discussion).
Fig. 4.
Fig. 4.
Evidence that the ranges of the first two formants in speech specifically bias the distribution of formant ratios toward chromatic scale intervals. The diagram of the piano keyboard shows the ranges of the formants for the speakers in our data set (brackets); the numbers on the keyboard indicate harmonic overtones above the fundamental. If the formant ranges were lower than those found in speech (e.g., reduced by half as shown in A), then compared with emotionally neutral speech (Fig. 3), the intervals generated would represent only a subset of the chromatic scale (red bars; see Results). If the ranges were higher (e.g., doubled; as shown in B), all of the chromatic intervals would be represented (red bars), but their proportion would be diluted by additional nonchromatic intervals (black bars). The chromatic scale, however, is not optimized in the distribution; optimization would require formant peaks somewhat lower than those in our data (a reduction of ≈0.2 from the harmonic values generated by our subjects). An optimal representation of the chromatic scale would thus entail slightly higher fundamental frequencies of voiced phones, which presumably occur in the more energized natural speech that we routinely experience. “F” and “O” on the abscissa denote the position of fifths and octaves; the ticks are at chromatic intervals (see Fig. 3).
Fig. 5.
Fig. 5.
Ratio relationships between the peak intensity of the first and second formants from the American English (A) and Mandarin (B) monologues, compiled from all of the participants. All 12 intervals (red bars) of the chromatic scale in just intonation are represented in both speech databases; black bars show the frequency of occurrence of interval ratios that do not fall on chromatic scale tones (see also Tables 1 and 2).

References

    1. Fletcher NH. Acoustic Systems in Biology. New York: Oxford Univ Press; 1992.
    1. Schwartz DA, Purves D. Hear Res. 2004;194:31–46. - PubMed
    1. Schwartz DA, Howe CQ, Purves D. J Neurosci. 2003;23:7160–7168. - PMC - PubMed
    1. Petersen GE, Barney HL. J Acoust Soc Am. 1952;24:175–184.
    1. Stevens KN, House AS. J Speech Hear Res. 1961;4:303–320. - PubMed

Publication types

LinkOut - more resources