Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2021 Aug;42(3):260-281.
doi: 10.1055/s-0041-1735134. Epub 2021 Sep 24.

Creating Clarity in Noisy Environments by Using Deep Learning in Hearing Aids

Affiliations
Review

Creating Clarity in Noisy Environments by Using Deep Learning in Hearing Aids

Asger Heidemann Andersen et al. Semin Hear. 2021 Aug.

Abstract

Hearing aids continue to acquire increasingly sophisticated sound-processing features beyond basic amplification. On the one hand, these have the potential to add user benefit and allow for personalization. On the other hand, if such features are to benefit according to their potential, they require clinicians to be acquainted with both the underlying technologies and the specific fitting handles made available by the individual hearing aid manufacturers. Ensuring benefit from hearing aids in typical daily listening environments requires that the hearing aids handle sounds that interfere with communication, generically referred to as "noise." With this aim, considerable efforts from both academia and industry have led to increasingly advanced algorithms that handle noise, typically using the principles of directional processing and postfiltering. This article provides an overview of the techniques used for noise reduction in modern hearing aids. First, classical techniques are covered as they are used in modern hearing aids. The discussion then shifts to how deep learning, a subfield of artificial intelligence, provides a radically different way of solving the noise problem. Finally, the results of several experiments are used to showcase the benefits of recent algorithmic advances in terms of signal-to-noise ratio, speech intelligibility, selective attention, and listening effort.

Keywords: beamforming; directionality; noise reduction; postfiltering.

PubMed Disclaimer

Conflict of interest statement

Conflict of Interest None declared.

Figures

Figure 1
Figure 1
An overview of the components used in the noise reduction system of a typical modern hearing aid. The signals from two microphones are converted to a time–frequency representation using separate analysis filterbanks (AFBs). An adaptive beamformer controls the directional response of the system by applying variable gains and time delays to one of the two signals before these are summed together. A postfilter computes a time- and frequency-dependent gain which is applied to the signal before a synthesis filterbank (SFB) converts the time–frequency representation of the signal back to an audio waveform.
Figure 2
Figure 2
First, an analysis filterbank reveals the frequency structure inherent in an audio waveform of speech. Processing is performed in this representation, after which a synthesis filterbank is used to transform the result back to an audio waveform.
Figure 3
Figure 3
The physical principle utilized in beamforming. (a) A single-tone signal impinging on a pair of microphones at an angle of 90 degrees relative to the axis of the microphones. The oscillations are picked up simultaneously by the microphones, resulting in signals that are in phase. When the two signals are summed, they add constructively to form a signal with twice the individual amplitude. (b) The signal impinges from a larger angle. Because of this, the sound arrives slightly earlier at the rear microphone compared with the front microphone. This causes the two signals to be out of phase. When summed, the signals cancel due to destructive interference.
Figure 4
Figure 4
Showing how the principle illustrated in Fig. 3 can be controlled. The two microphones pick up signals that are not in phase and do not have the same amplitude. By applying a time delay and a gain to one of the signals, these differences are removed. The resulting signals sum constructively to a signal with twice the amplitude, even though the signals picked up by the microphones would not have.
Figure 5
Figure 5
Examples of directional responses that can be achieved using the described principles of beamforming. The plots show the attenuation of sounds reaching the hearing aid depending on the angle of arrival in the horizontal plane.
Figure 6
Figure 6
Examples of directional responses achieved with an adaptive MVDR beamformer for different configurations of target and noise. In all four examples, the target is located in front of the user (0°), while one or more noise sources are located at directions indicated by the dots near the perimeter of the plots.
Figure 7
Figure 7
( a ) A spectrogram of a speech utterance. ( b ) The same utterance mixed with 24-talker babble at +3 dB SNR. ( c ) The noisy utterance after postfiltering. ( d ) Gray scale version of b , colorized according to the gain applied by the postfilter.
Figure 8
Figure 8
Deep learning refers to the training and use of neural networks to solve tasks. It is a subfield of machine learning which itself is a field of artificial intelligence.
Figure 9
Figure 9
Showing how a neural network is trained to perform postfiltering. The neural network is used to compute postfilter gains for examples of noisy audio from the training database. These postfilter gains are applied to the noisy signals, and the result is compared with the underlying clean target signal using a loss function. Through the mathematical techniques of backpropagation and gradient descent, the neural network connections are updated to make the loss progressively smaller so that the postfiltered noisy signal is more similar to the underlying clean target.
Figure 10
Figure 10
Comparison of conventional postfiltering and DNN-based postfiltering. ( a ) A noisy speech utterance processed by a conventional postfilter (same as Fig. 7c ). ( b ) The same noisy utterance processed by a DNN-based postfilter. ( c ) A gray scale spectrogram of the noisy utterance colorized according to the gain applied by the conventional postfilter (same as Fig. 7d ). ( d ) Same as c , but for the DNN-based postfilter.
Figure 11
Figure 11
The workflow involved in using spherical microphone array recordings for training neural networks. ( a ) Noisy listening environments are recorded with a spherical microphone array. ( b ) The microphone array is placed in the center of a loudspeaker array. The transfer functions from all loudspeakers to all microphones are measured. ( c ) Using techniques from Minnaar et al, the transfer functions are inverted to reproduce the recorded listening environment at the center of the array. ( d ) Target audio is recorded by having one or more participants listen to noise recordings via open headphones while conversing in a quiet environment. ( e ) The acoustic scene is obtained by summing the noise and target recordings. Target and noisy sound signals are rendered to hearing aid microphones and used for neural network training.
Figure 12
Figure 12
Mean SRTs for 50% correct speech intelligibility obtained in the Oldenburg sentence test ( N  = 20). Error bars indicate the standard error of the mean. Note that the y -axis is reversed, such that higher bars indicate higher speech intelligibility. * p  < 0.05, ** p  < 0.01, *** p  < 0.001.
Figure 13
Figure 13
Strength of cortical representation of the entire acoustic scene (top left) and of the foreground (top right) as estimated from early EEG responses, and of the target talker (bottom left) and of the masker talker (bottom right) as estimated from late EEG responses. Gray dots indicate trial-averaged individual results, whereas black dots and error bars show the group strengths of cortical representation (grand average ± 1 between-subject standard error of the mean). Each horizontal line in gray denotes a single participant.
Figure 14
Figure 14
Pupil size depicted as the average change from baseline. Black dots and error bars indicate the average across participants (mean ±1 between-subject standard error of the mean). Gray dots and lines depict individual means across trials.

References

    1. Kochkin S. MarkeTrak VIII: consumer satisfaction with hearing aids is slowly increasing. Hear J. 2010;63(01):19–32.
    1. Picou E M. MarkeTrak 10 (MT10) survey results demonstrate high satisfaction with and benefits from hearing aids. Semin Hear. 2020;41(01):21–36. - PMC - PubMed
    1. Moore B CJ. 2nd ed. Wiley; 2007. Cochlear Hearing Loss: Physiological, Psychological and Technical Issues.
    1. Plomp R. Auditory handicap of hearing impairment and the limited benefit of hearing aids. J Acoust Soc Am. 1978;63(02):533–549. - PubMed
    1. Lopez R S, Bianchi F, Fereczkowski M, Santurette S, Dau T.Data-driven approach for auditory profilingIn: Proceedings of the International Symposium on Auditory and Audiological Research.Nyborg, Denmark: 2017247–254.