Cascaded Amplitude Modulations in Sound Texture Perception

Richard McWalter¹, Torsten Dau¹

Affiliations

PMID: 28955191
PMCID: PMC5601004
DOI: 10.3389/fnins.2017.00485

Cascaded Amplitude Modulations in Sound Texture Perception

Richard McWalter et al. Front Neurosci. 2017.

. 2017 Sep 11:11:485.

doi: 10.3389/fnins.2017.00485. eCollection 2017.

Authors

Richard McWalter¹, Torsten Dau¹

Affiliation

¹ Hearing Systems Group, Technical University of DenmarkKongens Lyngby, Denmark.

PMID: 28955191
PMCID: PMC5601004
DOI: 10.3389/fnins.2017.00485

Abstract

Sound textures, such as crackling fire or chirping crickets, represent a broad class of sounds defined by their homogeneous temporal structure. It has been suggested that the perception of texture is mediated by time-averaged summary statistics measured from early auditory representations. In this study, we investigated the perception of sound textures that contain rhythmic structure, specifically second-order amplitude modulations that arise from the interaction of different modulation rates, previously described as "beating" in the envelope-frequency domain. We developed an auditory texture model that utilizes a cascade of modulation filterbanks that capture the structure of simple rhythmic patterns. The model was examined in a series of psychophysical listening experiments using synthetic sound textures-stimuli generated using time-averaged statistics measured from real-world textures. In a texture identification task, our results indicated that second-order amplitude modulation sensitivity enhanced recognition. Next, we examined the contribution of the second-order modulation analysis in a preference task, where the proposed auditory texture model was preferred over a range of model deviants that lacked second-order modulation rate sensitivity. Lastly, the discriminability of textures that included second-order amplitude modulations appeared to be perceived using a time-averaging process. Overall, our results demonstrate that the inclusion of second-order modulation analysis generates improvements in the perceived quality of synthetic textures compared to the first-order modulation analysis considered in previous approaches.

Keywords: amplitude modulation; auditory model; auditory perception; natural sound; sound texture.

PubMed Disclaimer

Figures

**Figure 1**
Texture analysis model. **(A)** The functional auditory model captures the tuning properties of the peripheral and subcortical auditory system: (1) An auditory filterbank simulates the resonance frequencies of the cochlea, (2) a non-linearity captures the compression of the cochlea followed by a computation of the Hilbert envelope, functionally modeling the transduction from the mechanical vibrations on the basilar membrane to the receptor potentials in the hair cells, (3) a first-order modulation filterbank captures the selectivity of the auditory system to different envelope fluctuation rates, and (4) a second-order modulation filterbank captures the sensitivity of the auditory system to beating in the envelope frequency domain. Texture statistics include marginal moments of cochlear envelopes (M), 1st-order modulation power (M¹P), pair-wise correlations between cochlear envelopes (C), pairwise correlations between modulation subbands (MC₁), phase correlations between octave-spaced modulation bands (MC₂), and 2nd-order modulation power (M²P). **(B)** Example second-order modulation stimulus. The far-left panel shows the input stimulus that consists of two short 62.5 ms pulses repeated every 500 ms. The example outputs are shown at each stage of the model. The output of the 1st-order modulation band is shown for the 8 Hz subband which captures the period of the short pulses. The 2nd-order modulation band is shown for the 2 Hz subband which captures the period of the repetition.

**Figure 2**
Texture Statistics. **(A)** Cochlear envelope marginal moments (mean, coefficient of variance, skewness, kurtosis) measured from three real-world texture recordings (Swamp insects, campfire, small stream). **(B)** Cochlear envelope pair-wise correlations measured between different cochlear channels. The label of the texture analyzed is located above the subfigure (and for all subsequent subfigures). Lightened regions here and elsewhere denote texture statistics that are not imposed during the synthesis process. **(C)** Modulation band power (variance). The figure is normalized by the modulation power of Gaussian noise and shown on a log (dB) scale. **(D)** Modulation correlation measured for a particular rate across cochlear channels. The modulation rate is indicated above the subfigure. **(E)** Modulation phase correlation measured between octave-spaced modulation bands. **(F)** Second-order modulation band power (variance). The second-order modulation frequency is indicated above the individual subfigures for a selection of rates (0.5, 1, and 2 Hz). The statistics are plotted relative to Gaussian noise on a log (dB) scale.

**Figure 3**
Texture synthesis system and synthetic examples. **(A)** Texture synthesis is accomplished by measuring statistics from a real-world texture recording at different stages of the auditory texture model. The statistics are then passed to the synthesis system that adjusts the statistics of a Gaussian noise seed to match the input statistics. The iterative process outputs a synthetic texture with the same time-averaged statistics as the real-world texture recording. **(B)** Original real-world texture recordings and their synthetic counterparts. The synthetic textures were generated with a complete set of texture statistics. Example audio files corresponding to the original and synthetic spectrograms can be found in the Supplementary Material (Swamp Insects: Audio files 1, 2; Campfire: Audio files 3, 4; Small Stream: Audio files 5, 6).

**Figure 4**
Verification of second-order texture synthesis. **(A)** Spectrogram of example rhythmic (second-order modulated) noise bursts with 500 ms repetition pattern. The upper panel shows the original sound, the middle panel shows the synthetic version with second-order modulation texture statistics (w/ 2nd-order mods.) and the bottom panel shows the synthetic version without second-order modulation texture statistics (w/o 2nd-order mods.). **(B)** Second-order modulation power statistics. The 500 ms period is reflected in the majority of power held within the 2 Hz 2nd-order modulation band (lower-left panel). Example audio files corresponding to the spectrograms can be found in the Supplementary Material (Original: Audio file 7; w/ 2nd-order mods.: Audio file 8; w/o 2nd-order mods.: Audio file 9).

**Figure 5**
Synthetic texture identification and preference tasks. **(A)** Identification of sound textures improves with the inclusion of more statistics. Asterisks denote significant differences between conditions, p < 0.01 (paired t-tests, corrected for multiple comparisons). Error bars here and elsewhere show the standard error. Dashed lines here and elsewhere show chance performance. **(B)** Modulation filter(bank) structure used in the listening experiments. For low-pass (LP) conditions, only the statistics of the signal in the passband were modified. **(C)** Sounds synthesized with the 2nd-order modulation statistics were preferred over all other auditory texture models. Asterisk denotes significance from chance (p < 0.01). **(D)** Eight most preferred (left) and least preferred (right) textures from experiment 2, relative to first-order modulation filterbank model (half-octave spacing).

**Figure 6**
Textures that benefit from second-order modulation statistics. Two example textures from the preferred list: Helicopter (left) and frogs-crickets (right). The left panel shows the second-order modulation statistics for six selected bands. The right panel shows the spectrogram of the original texture (top) and the synthetic texture with second-order modulation statistics (middle) and without second-order modulation statistics (bottom). Example audio files corresponding to the spectrograms of the original, synthetic with 2nd-order modulations, and without 2nd-order modulations can be found in the Supplementary Material (helicopter: Audio files 10–12; frogs-crickets: Audio files 13–15).

**Figure 7**
Second-order amplitude modulation and texture exemplar discrimination. The black symbols show the response to second-order amplitude modulated Gaussian noise exemplar discrimination as a function of modulation rate. Error bars indicate the standard error. The blue symbol indicates exemplar discrimination performance for complex second-order amplitude modulated Gaussian noise. The green symbol indicates exemplar discrimination performance for synthetic sound textures that include all indicated texture statistics (including second-order amplitude modulation statistics). The red symbol indicates exemplar discrimination performance for top-8 synthetic (Experiment 2) sound textures that include all indicated texture statistics.

See this image and copyright information in PMC

Cited by

Cascaded Tuning to Amplitude Modulation for Natural Sound Recognition.
Koumura T, Terashima H, Furukawa S. Koumura T, et al. J Neurosci. 2019 Jul 10;39(28):5517-5533. doi: 10.1523/JNEUROSCI.2914-18.2019. Epub 2019 May 15. J Neurosci. 2019. PMID: 31092586 Free PMC article.
Predicting tingling sensations induced by autonomous sensory meridian response (ASMR) videos based on sound texture statistics: a comparison to pleasant feelings.
Terashima H, Tada K, Kondo HM. Terashima H, et al. Philos Trans R Soc Lond B Biol Sci. 2024 Aug 26;379(1908):20230254. doi: 10.1098/rstb.2023.0254. Epub 2024 Jul 15. Philos Trans R Soc Lond B Biol Sci. 2024. PMID: 39005038 Free PMC article.
Human Auditory Ecology: Extending Hearing Research to the Perception of Natural Soundscapes by Humans in Rapidly Changing Environments.
Lorenzi C, Apoux F, Grinfeder E, Krause B, Miller-Viacava N, Sueur J. Lorenzi C, et al. Trends Hear. 2023 Jan-Dec;27:23312165231212032. doi: 10.1177/23312165231212032. Trends Hear. 2023. PMID: 37981813 Free PMC article.
Developmental origins of natural sound perception.
Polver S, Miller-Viacava N, Fraticelli M, Gervain J, Lorenzi C. Polver S, et al. Front Psychol. 2024 Dec 11;15:1474961. doi: 10.3389/fpsyg.2024.1474961. eCollection 2024. Front Psychol. 2024. PMID: 39726626 Free PMC article. Review.
Illusory sound texture reveals multi-second statistical completion in auditory scene analysis.
McWalter R, McDermott JH. McWalter R, et al. Nat Commun. 2019 Nov 8;10(1):5096. doi: 10.1038/s41467-019-12893-0. Nat Commun. 2019. PMID: 31704913 Free PMC article.

See all "Cited by" articles

References

1. Andén J., Mallat S. (2011). Multiscale scattering for audio classification, in ISMIR (Miami, FL: ), 657–662.
1. Andén J., Mallat S. (2012). Scattering representation of modulated sounds, in Proceedings of the 15th International Conference on Digital Audio Effects (New York, NY: ).
1. Andén J., Mallat S. (2014). Deep Scattering Spectrum. IEEE Trans. Signal Process. 62, 4114–4128. 10.1109/TSP.2014.2326991 - DOI
1. Andreou L. V., Kashino M., Chait M. (2011). The role of temporal regularity in auditory segregation. Hear. Res. 280, 228–235. 10.1016/j.heares.2011.06.001 - DOI - PubMed
1. Balas B., Nakano L., Rosenholtz R. (2009). A summary-statistic representation in peripheral vision explains visual crowding. J. Vis. 9, 13.1–13.18. 10.1167/9.12.13 - DOI - PMC - PubMed

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Cascaded Amplitude Modulations in Sound Texture Perception

Affiliation

Cascaded Amplitude Modulations in Sound Texture Perception

Authors

Affiliation

Abstract

Figures

Similar articles

Cited by

References

LinkOut - more resources

Full Text Sources

Other Literature Sources