Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 May 13:4:264.
doi: 10.3389/fpsyg.2013.00264. eCollection 2013.

The role of the auditory brainstem in processing musically relevant pitch

Affiliations

The role of the auditory brainstem in processing musically relevant pitch

Gavin M Bidelman. Front Psychol. .

Abstract

Neuroimaging work has shed light on the cerebral architecture involved in processing the melodic and harmonic aspects of music. Here, recent evidence is reviewed illustrating that subcortical auditory structures contribute to the early formation and processing of musically relevant pitch. Electrophysiological recordings from the human brainstem and population responses from the auditory nerve reveal that nascent features of tonal music (e.g., consonance/dissonance, pitch salience, harmonic sonority) are evident at early, subcortical levels of the auditory pathway. The salience and harmonicity of brainstem activity is strongly correlated with listeners' perceptual preferences and perceived consonance for the tonal relationships of music. Moreover, the hierarchical ordering of pitch intervals/chords described by the Western music practice and their perceptual consonance is well-predicted by the salience with which pitch combinations are encoded in subcortical auditory structures. While the neural correlates of consonance can be tuned and exaggerated with musical training, they persist even in the absence of musicianship or long-term enculturation. As such, it is posited that the structural foundations of musical pitch might result from innate processing performed by the central auditory system. A neurobiological predisposition for consonant, pleasant sounding pitch relationships may be one reason why these pitch combinations have been favored by composers and listeners for centuries. It is suggested that important perceptual dimensions of music emerge well before the auditory signal reaches cerebral cortex and prior to attentional engagement. While cortical mechanisms are no doubt critical to the perception, production, and enjoyment of music, the contribution of subcortical structures implicates a more integrated, hierarchically organized network underlying music processing within the brain.

Keywords: auditory event-related potentials; auditory nerve; brainstem response; consonance and dissonance; frequency-following response (FFR); musical pitch perception; musical training; tonality.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Consonance rankings for chromatic scale tone combinations of Western music practice. (A) Consonance (i.e., “pleasantness”) ratings reported by Kameoka and Kuriyagawa (1969b) for two-tone intervals (dyads). Stimuli were composed of two simultaneously sounding complex tones (inset). The spacing between fundamental frequencies (f1, f2) was varied to form the various chromatic intervals within the range of an octave; the lower tone (f1) was always fixed at 440 Hz and the upper tone (f2) varied from 440 to 880 Hz in semitone spacing. Note the higher behavioral ratings for the consonant pitch relationships [e.g., 0 (Un), 7 (P5), 12 (Oct) semitones] relative to dissonant relationships [e.g., 2 (m2), 6 (TT), 11 (M7) semitones] as well as the hierarchical arrangement of intervals (Un > Oct > P5 > P4 > M6, etc). (B) Rank order of musical interval consonance ratings reported across seven psychophysical studies (Faist, ; Meinong and Witasek, ; Buch, ; Pear, ; Kreuger, ; Malmberg, ; Stumpf, 1989). Open circles represent the median consonance rank assigned to each of the 12 chromatic dyads. Figures adapted from Kameoka and Kuriyagawa (1969b) and Schwartz et al. (2003) with permission from The Acoustical Society of America and Society for Neuroscience, respectively.
Figure 2
Figure 2
Cortical event-related potentials (ERPs) elicited by musical dyads. (A) Cortical ERP waveforms recorded at the vertex of the scalp (Cz lead) in response to chromatic musical intervals. Response trace color corresponds to the evoking stimulus denoted in music notation. Interval stimuli were composed of two simultaneously sounding pure tones. (B) Cortical N2 response magnitude is modulated by the degree of consonance; dissonant pitch relationships evoke larger N2 magnitude than consonant intervals. The shaded region demarcates the critical bandwidth (CBW); perceived dissonance created by intervals larger than the CBW cannot be attributed to cochlear interactions (e.g., beating between frequency components). Perfect consonant intervals (filled circles); imperfect consonant intervals (filled triangles); dissonant intervals (open circles) (C) Response magnitude is correlated with the degree of simplicity of musical pitch intervals; simpler, more consonant pitch relationships (e.g., P1, P8, P5) elicit smaller N2 than more complex, dissonant pitch relationships (e.g., M2, TT, M7). Figure adapted from Itoh et al. (2010) with permission from The Acoustical Society of America.
Figure 3
Figure 3
Human brainstem frequency-following responses (FFRs) elicited by musical dyads. Grand average FFR waveforms (A) and their corresponding frequency spectra (B) evoked by the dichotic presentation of four representative musical intervals. Consonant intervals, blue; dissonant intervals, red. (A) Clearer, more robust periodicity is observed for consonant relative to dissonant intervals. (B) Frequency spectra reveal that FFRs faithfully preserve the harmonic constituents of both musical notes of the interval (compare response spectrum, filled area, to stimulus spectrum, harmonic locations denoted by dots). Consonant intervals evoked more robust spectral magnitudes across harmonics than dissonant intervals. Amplitudes are normalized relative to the unison. (C) Correspondence between FFR pitch salience computed from brainstem responses and behavior consonance ratings. Neural responses well predict human preferences for musical intervals. Note the systematic clustering of consonant and dissonant intervals and the maximal separation of the unison (most consonant interval) from the minor second (most dissonant interval) in the neural-behavioral space. Data from Bidelman and Krishnan (2009).
Figure 4
Figure 4
Auditory nerve (AN) responses to musical dyads. (A) Population level interspike interval histograms (ISIHs) for a representative consonant (perfect fifth: 220 + 330 Hz) and dissonant (minor second: 220 + 233 Hz) musical interval. ISIHs quantify the periodicity of spike discharges from a population of 70 AN fibers driven by a single two-tone musical interval. (B) Neural pitch salience profiles computed from ISIHs via harmonic sieve analyses quantify the salience of all possible pitches contained in AN responses based on harmonicity of the spike distribution. Their peak magnitude (arrows) represents a singular measure of neural pitch salience for the eliciting musical interval. (C) AN pitch salience across the chromatic intervals is more robust for consonant than dissonant intervals. Rank order of the intervals according to their neural pitch salience parallels the hierarchical arrangement of pitches according to Western music theory (i.e., Un > Oct > P5, >P4, etc.). (D) AN pitch representations predict the hierarchical order of behavioral consonance judgments of human listeners (behavioral data from normal-hearing listeners of Tufts et al., 2005). AN data reproduced from Bidelman and Heinz (2011).
Figure 5
Figure 5
Comparison between auditory nerve, human brainstem evoked potentials, and behavioral responses to musical intervals. (Top left) AN responses correctly predict perceptual attributes of consonance, dissonance, and the hierarchical ordering of musical dyads. AN neural pitch salience is shown as a function of the number of semitones separating the interval’s lower and higher pitch over the span of an octave (i.e., 12 semitones). Consonant musical intervals (blue) tend to fall on or near peaks in neural pitch salience whereas dissonant intervals (red) tend to fall within trough regions, indicating more robust encoding for the former. Among intervals common to a single class (e.g., all consonant intervals), AN responses show differential encoding resulting in the hierarchical arrangement of pitch typically described by Western music theory (i.e., Un > Oct > P5, >P4, etc.). (Top middle) neural correlates of musical consonance observed in human brainstem responses. As in the AN, brainstem responses reveal stronger encoding of consonant relative to dissonant pitch relationships. (Top right) behavioral consonance ratings reported by human listeners. Dyads considered consonant according to music theory are preferred over those considered dissonant [minor second (m2), tritone (TT), major seventh (M7)]. For comparison, the solid line shows predictions from a mathematical model of consonance and dissonance (Sethares, 1993) where local maxima denote higher degrees of consonance than minima, which denote dissonance. (Bottom row) auditory nerve (left) and brainstem (middle) responses similarly predict behavioral chordal sonority ratings (right) for the four most common triads in Western music. Chords considered consonant according to music theory (i.e., major, minor) elicit more robust subcortical responses and show an ordering expected by music practice (i.e., major > minor ≫ diminished > augmented). AN data from Bidelman and Heinz (2011); interval data from Bidelman and Krishnan (2009); chord data from Bidelman and Krishnan (2011).
Figure 6
Figure 6
Experience-dependent enhancement of brainstem responses resulting from musical training. (A) Brainstem FFR time-waveforms elicited by a chordal arpeggio (i.e., three consecutive tones) recorded in musician and non-musicians listeners (red and blue, respectively). (B) Expanded time window around the onset response to the chordal third (≈117 ms), the defining note of the arpeggio sequence. Relative to non-musicians, musician responses are both larger and more temporally precise as evident by their shorter duration P-N onset complex (C) and more robust amplitude (D). Musical training thus improves both the precision and magnitude of time-locked neural activity to musical pitch. Error bars = SEM. Data from Bidelman et al. (2011d).

References

    1. Aiken S. J., Picton T. W. (2008). Envelope and spectral frequency-following responses to vowel sounds. Hear. Res. 245, 35–4710.1016/j.heares.2008.08.004 - DOI - PubMed
    1. Aldwell E., Schachter C. (2003). Harmony and Voice Leading. Boston: Thomson/Schirmer
    1. Alkhoun I., Gallégo S., Moulin A., Ménard M., Veuillet E., Berger-Vachon C., et al. (2008). The temporal relationship between speech auditory brainstem responses and the acoustic pattern of the phoneme/ba/in normal-hearing adults. J. Clin. Neurophysiol. 119, 922–93310.1016/j.clinph.2007.12.010 - DOI - PubMed
    1. Ayotte J., Peretz I., Hyde K. (2002). Congenital amusia: a group study of adults afflicted with a music-specific disorder. Brain 125, 238–25110.1093/brain/awf028 - DOI - PubMed
    1. Baumann S., Griffiths T. D., Sun L., Petkov C. I., Thiele A., Rees A. (2011). Orthogonal representation of sound dimensions in the primate midbrain. Nat. Neurosci. 14, 423–42510.1038/nn.2771 - DOI - PMC - PubMed