Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Sep 11:11:485.
doi: 10.3389/fnins.2017.00485. eCollection 2017.

Cascaded Amplitude Modulations in Sound Texture Perception

Affiliations

Cascaded Amplitude Modulations in Sound Texture Perception

Richard McWalter et al. Front Neurosci. .

Abstract

Sound textures, such as crackling fire or chirping crickets, represent a broad class of sounds defined by their homogeneous temporal structure. It has been suggested that the perception of texture is mediated by time-averaged summary statistics measured from early auditory representations. In this study, we investigated the perception of sound textures that contain rhythmic structure, specifically second-order amplitude modulations that arise from the interaction of different modulation rates, previously described as "beating" in the envelope-frequency domain. We developed an auditory texture model that utilizes a cascade of modulation filterbanks that capture the structure of simple rhythmic patterns. The model was examined in a series of psychophysical listening experiments using synthetic sound textures-stimuli generated using time-averaged statistics measured from real-world textures. In a texture identification task, our results indicated that second-order amplitude modulation sensitivity enhanced recognition. Next, we examined the contribution of the second-order modulation analysis in a preference task, where the proposed auditory texture model was preferred over a range of model deviants that lacked second-order modulation rate sensitivity. Lastly, the discriminability of textures that included second-order amplitude modulations appeared to be perceived using a time-averaging process. Overall, our results demonstrate that the inclusion of second-order modulation analysis generates improvements in the perceived quality of synthetic textures compared to the first-order modulation analysis considered in previous approaches.

Keywords: amplitude modulation; auditory model; auditory perception; natural sound; sound texture.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Texture analysis model. (A) The functional auditory model captures the tuning properties of the peripheral and subcortical auditory system: (1) An auditory filterbank simulates the resonance frequencies of the cochlea, (2) a non-linearity captures the compression of the cochlea followed by a computation of the Hilbert envelope, functionally modeling the transduction from the mechanical vibrations on the basilar membrane to the receptor potentials in the hair cells, (3) a first-order modulation filterbank captures the selectivity of the auditory system to different envelope fluctuation rates, and (4) a second-order modulation filterbank captures the sensitivity of the auditory system to beating in the envelope frequency domain. Texture statistics include marginal moments of cochlear envelopes (M), 1st-order modulation power (M1P), pair-wise correlations between cochlear envelopes (C), pairwise correlations between modulation subbands (MC1), phase correlations between octave-spaced modulation bands (MC2), and 2nd-order modulation power (M2P). (B) Example second-order modulation stimulus. The far-left panel shows the input stimulus that consists of two short 62.5 ms pulses repeated every 500 ms. The example outputs are shown at each stage of the model. The output of the 1st-order modulation band is shown for the 8 Hz subband which captures the period of the short pulses. The 2nd-order modulation band is shown for the 2 Hz subband which captures the period of the repetition.
Figure 2
Figure 2
Texture Statistics. (A) Cochlear envelope marginal moments (mean, coefficient of variance, skewness, kurtosis) measured from three real-world texture recordings (Swamp insects, campfire, small stream). (B) Cochlear envelope pair-wise correlations measured between different cochlear channels. The label of the texture analyzed is located above the subfigure (and for all subsequent subfigures). Lightened regions here and elsewhere denote texture statistics that are not imposed during the synthesis process. (C) Modulation band power (variance). The figure is normalized by the modulation power of Gaussian noise and shown on a log (dB) scale. (D) Modulation correlation measured for a particular rate across cochlear channels. The modulation rate is indicated above the subfigure. (E) Modulation phase correlation measured between octave-spaced modulation bands. (F) Second-order modulation band power (variance). The second-order modulation frequency is indicated above the individual subfigures for a selection of rates (0.5, 1, and 2 Hz). The statistics are plotted relative to Gaussian noise on a log (dB) scale.
Figure 3
Figure 3
Texture synthesis system and synthetic examples. (A) Texture synthesis is accomplished by measuring statistics from a real-world texture recording at different stages of the auditory texture model. The statistics are then passed to the synthesis system that adjusts the statistics of a Gaussian noise seed to match the input statistics. The iterative process outputs a synthetic texture with the same time-averaged statistics as the real-world texture recording. (B) Original real-world texture recordings and their synthetic counterparts. The synthetic textures were generated with a complete set of texture statistics. Example audio files corresponding to the original and synthetic spectrograms can be found in the Supplementary Material (Swamp Insects: Audio files 1, 2; Campfire: Audio files 3, 4; Small Stream: Audio files 5, 6).
Figure 4
Figure 4
Verification of second-order texture synthesis. (A) Spectrogram of example rhythmic (second-order modulated) noise bursts with 500 ms repetition pattern. The upper panel shows the original sound, the middle panel shows the synthetic version with second-order modulation texture statistics (w/ 2nd-order mods.) and the bottom panel shows the synthetic version without second-order modulation texture statistics (w/o 2nd-order mods.). (B) Second-order modulation power statistics. The 500 ms period is reflected in the majority of power held within the 2 Hz 2nd-order modulation band (lower-left panel). Example audio files corresponding to the spectrograms can be found in the Supplementary Material (Original: Audio file 7; w/ 2nd-order mods.: Audio file 8; w/o 2nd-order mods.: Audio file 9).
Figure 5
Figure 5
Synthetic texture identification and preference tasks. (A) Identification of sound textures improves with the inclusion of more statistics. Asterisks denote significant differences between conditions, p < 0.01 (paired t-tests, corrected for multiple comparisons). Error bars here and elsewhere show the standard error. Dashed lines here and elsewhere show chance performance. (B) Modulation filter(bank) structure used in the listening experiments. For low-pass (LP) conditions, only the statistics of the signal in the passband were modified. (C) Sounds synthesized with the 2nd-order modulation statistics were preferred over all other auditory texture models. Asterisk denotes significance from chance (p < 0.01). (D) Eight most preferred (left) and least preferred (right) textures from experiment 2, relative to first-order modulation filterbank model (half-octave spacing).
Figure 6
Figure 6
Textures that benefit from second-order modulation statistics. Two example textures from the preferred list: Helicopter (left) and frogs-crickets (right). The left panel shows the second-order modulation statistics for six selected bands. The right panel shows the spectrogram of the original texture (top) and the synthetic texture with second-order modulation statistics (middle) and without second-order modulation statistics (bottom). Example audio files corresponding to the spectrograms of the original, synthetic with 2nd-order modulations, and without 2nd-order modulations can be found in the Supplementary Material (helicopter: Audio files 10–12; frogs-crickets: Audio files 13–15).
Figure 7
Figure 7
Second-order amplitude modulation and texture exemplar discrimination. The black symbols show the response to second-order amplitude modulated Gaussian noise exemplar discrimination as a function of modulation rate. Error bars indicate the standard error. The blue symbol indicates exemplar discrimination performance for complex second-order amplitude modulated Gaussian noise. The green symbol indicates exemplar discrimination performance for synthetic sound textures that include all indicated texture statistics (including second-order amplitude modulation statistics). The red symbol indicates exemplar discrimination performance for top-8 synthetic (Experiment 2) sound textures that include all indicated texture statistics.

Similar articles

Cited by

References

    1. Andén J., Mallat S. (2011). Multiscale scattering for audio classification, in ISMIR (Miami, FL: ), 657–662.
    1. Andén J., Mallat S. (2012). Scattering representation of modulated sounds, in Proceedings of the 15th International Conference on Digital Audio Effects (New York, NY: ).
    1. Andén J., Mallat S. (2014). Deep Scattering Spectrum. IEEE Trans. Signal Process. 62, 4114–4128. 10.1109/TSP.2014.2326991 - DOI
    1. Andreou L. V., Kashino M., Chait M. (2011). The role of temporal regularity in auditory segregation. Hear. Res. 280, 228–235. 10.1016/j.heares.2011.06.001 - DOI - PubMed
    1. Balas B., Nakano L., Rosenholtz R. (2009). A summary-statistic representation in peripheral vision explains visual crowding. J. Vis. 9, 13.1–13.18. 10.1167/9.12.13 - DOI - PMC - PubMed

LinkOut - more resources