Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Jan-Dec:25:23312165211041422.
doi: 10.1177/23312165211041422.

Harmonic Cancellation-A Fundamental of Auditory Scene Analysis

Affiliations

Harmonic Cancellation-A Fundamental of Auditory Scene Analysis

Alain de Cheveigné. Trends Hear. 2021 Jan-Dec.

Abstract

This paper reviews the hypothesis of harmonic cancellation according to which an interfering sound is suppressed or canceled on the basis of its harmonicity (or periodicity in the time domain) for the purpose of Auditory Scene Analysis. It defines the concept, discusses theoretical arguments in its favor, and reviews experimental results that support it, or not. If correct, the hypothesis may draw on time-domain processing of temporally accurate neural representations within the brainstem, as required also by the classic equalization-cancellation model of binaural unmasking. The hypothesis predicts that a target sound corrupted by interference will be easier to hear if the interference is harmonic than inharmonic, all else being equal. This prediction is borne out in a number of behavioral studies, but not all. The paper reviews those results, with the aim to understand the inconsistencies and come up with a reliable conclusion for, or against, the hypothesis of harmonic cancellation within the auditory system.

Keywords: auditory scene analysis; harmonic cancellation; harmonicity; pitch perception; segregation.

PubMed Disclaimer

Conflict of interest statement

Declaration of Conflicting Interests: The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Figures

Figure 1.
Figure 1.
Segregation and matching. Sensory input is stripped of correlates of interfering sources, and the selected pattern, possibly incomplete, is passed on for pattern-matching (or model-fitting), together with a mask that indicates which parts are missing or unreliable. Initial stages are under attentional control.
Figure 2.
Figure 2.
Harmonic cancellation in the idealized frequency domain. Left: line spectra of a “target” sound (red) and a “background” (blue). Next to left: mixture. Next to right: harmonic mask with zeros at all harmonics of background. Right: recovered target.
Figure 3.
Figure 3.
Harmonic cancellation in the frequency domain using a short-term Fourier representation, or a filter bank. (a) 238 Hz target (red) and 200 Hz background (blue) analysed by a filter bank with 100 Hz resolution, (b) mixture, (c) harmonic mask, (d) target recovered from mixture (green), and same in the absence of the background (thin red), (e) same analysis but using a filter bank with non-uniform frequency resolution. Filter bandwidth depends on center frequency (CF) according to estimates of cochlear frequency resolution from Moore and Glasberg as implemented by Slaney (1993).
Figure 4.
Figure 4.
Harmonic cancellation in the time domain. (a) Impulse response of the cancellation filter (left) and corresponding magnitude transfer function (right). (b) Input (left) and output (right) of the cancellation filter for the background 100 Hz vowel /a/ (top), target 132 Hz vowel /e/ (middle), and mixture at TMR= 12 dB (bottom). (c) Schematic diagram of a circuit implementing the cancellation filter (Equation (1)) (left) and neural circuit with similar function (right). A spike on the direct pathway (black) is transmitted unless it coincides with a spike on the delayed pathway (red). The delay can be applied to the positive/excitatory input, instead of negative/inhibitory, with equivalent results.
Figure 5.
Figure 5.
(a) TMR within each channel of a model cochlear filter bank for an input consisting of a 124 Hz harmonic target mixed with a 100 Hz harmonic background with overall TMR=0 dB (black), 12 dB (dotted blue), or +12 dB (dotted red). Thanks to the filter bank, the TMR is enhanced in certain channels within which the target can be “glimpsed.”(b) Linear operations can be swapped. Filtering the signal before the filter bank is equivalent to applying the same filter to each channel after the filter bank.
Figure 6.
Figure 6.
Two hybrid models of harmonic cancellation. (a) Hybrid Model 1. Left: power as a function of CF for synthetic vowels /a/, F 0 =100 Hz (blue) and /e/, F 0 =106 Hz (red). Short lines above the plot indicate the first two formant frequencies of each vowel. Right: power as a function of CF for the mix before (black) and after (red) applying a cancellation filter tuned to suppress the period of /a/. (b) Hybrid Model 3. Black: per-channel TMR of vowel /e/ as a function of CF for a mixture of /a/+/e/ at overall TMR=0 dB. Channels are divided into three groups: TMR>12 dB (green, to be left intact), TMR<12dB (black, to be discarded), and 12 dB TMR 12 dB (red, to be filtered by a cancellation filter).
Figure 7.
Figure 7.
Left: waveform of the mix of target vowel /e/ (132 Hz) with background vowel /a/ (100 Hz) at TMR= 12 dB. Given four background cycles, intervals can be paired over spans of T , 2 T , and 3 T , with three, two and one repeats, respectively (blue arrows). Right: spectrum of target vowel /e/ (black line) and cancellation-filtered estimates obtained for spans T , 2 T , and 3 T (symbols). Averaging over estimates (or better: taking their maximum) would yield a more accurate estimate of the target, and averaging over repeats might further attenuate uncorrelated noise (not shown).

Similar articles

Cited by

References

    1. Akeroyd M. A. (2004). The across frequency independence of equalization of interaural time delay in the equalization-cancellation model of binaural unmasking. Journal of the Acoustical Society of America, 116, 1135–1148. 10.1121/1.1768959 - DOI - PubMed
    1. Albrecht O., Dondzillo A., Mayer F., Thompson J. A., Klug A. (2014). Inhibitory projections from the ventral nucleus of the trapezoid body to the medial nucleus of the trapezoid body in the mouse. Frontiers in Neural Circuits, 8, 83. 10.3389/fncir.2014.00083 - DOI - PMC - PubMed
    1. al Haytham I. 1030. (2002) Book of optics (in Hatfield).
    1. Arehart K. H., Rossi-Katz J., Swensson-Prutsman J. (2005). Double-vowel perception in listeners with Cochlear hearing loss: differences in fundamental frequency, ear of presentation, and relative amplitude. Journal of Speech, Language, and Hearing Research, 48, 236–252. 10.1044/1092-4388(2005/017) - DOI - PubMed
    1. Arehart K. H., Souza P. E., Muralimanohar R. K., Miller C. W. (2011). Effects of age on concurrent vowel perception in acoustic and simulated electroacoustic hearing. Journal of Speech, Language, and Hearing Research, 54, 190–210. 10.1044/1092-4388(2010/09-0145) - DOI - PMC - PubMed

Publication types

LinkOut - more resources