A two-stage spectral model for sound texture perception: Synthesis and psychophysics

Hironori Maruyama¹, Kosuke Okada¹, Isamu Motoyoshi¹

Affiliations

PMID: 36845027
PMCID: PMC9950610
DOI: 10.1177/20416695231157349

A two-stage spectral model for sound texture perception: Synthesis and psychophysics

Hironori Maruyama et al. Iperception. 2023.

. 2023 Feb 22;14(1):20416695231157349.

doi: 10.1177/20416695231157349. eCollection 2023 Jan-Feb.

Authors

Hironori Maruyama¹, Kosuke Okada¹, Isamu Motoyoshi¹

Affiliation

¹ Department of Life Sciences, The University of Tokyo, Japan.

PMID: 36845027
PMCID: PMC9950610
DOI: 10.1177/20416695231157349

Abstract

The natural environment is filled with a variety of auditory events such as wind blowing, water flowing, and fire crackling. It has been suggested that the perception of such textural sounds is based on the statistics of the natural auditory events. Inspired by a recent spectral model for visual texture perception, we propose a model that can describe the perceived sound texture only with the linear spectrum and the energy spectrum. We tested the validity of the model by using synthetic noise sounds that preserve the two-stage amplitude spectra of the original sound. Psychophysical experiment showed that our synthetic noises were perceived as like the original sounds for 120 real-world auditory events. The performance was comparable with the synthetic sounds produced by McDermott-Simoncelli's model which considers various classes of auditory statistics. The results support the notion that the perception of natural sound textures is predictable by the two-stage spectral signals.

Keywords: listening; models; texture; visuo-auditory interactions.

PubMed Disclaimer

Conflict of interest statement

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Figures

**Figure 1.**
Two-stage spectral representation of visual and auditory textures. (a) Two-stage spectral representation of visual texture (2D spectrum of luminance and 4D spectrum of subband energy). (b) Two-stage spectral representation of auditory texture (1D spectrum of sound waves and 2D spectrum of cochlear subband envelopes). See text for details.

**Figure 2.**
Schematic diagram of synthesized sound with two-stage spectral representation. PR denotes phase-randomized sound. le-PR denotes a phase-randomized sound that preserves the two-stage spectra. The original sound, PR, and le-PR are represented by waveforms. See text for details.

**Figure 3.**
(a) Example spectrograms of the original sounds and three types of synthesized sounds. From top to bottom, the results are shown for the original sounds, le-PR, PR, and the MS-synthesized sound, respectively. Each column shows the name of the original natural sounds. (b) Examples of le-PR spectrograms with different random phase spectra, respectively.

**Figure 4.**
The perceptual similarity rating for the three types of synthesized sounds to the original natural sound: the MS statistics, the two-stage spectrum (le-PR), and the linear spectrum (PR). (a) Joint histograms of similarity ratings between different types of synthetic sounds. Each panel shows the comparison in ratings for le-PR versus PR (left), MS versus PR (middle), and MS versus le-PR (right). (b) Similarity ratings averaged across 120 natural sounds. Error bars represent ±1 SEM between participants.

See this image and copyright information in PMC

References

1. Attias H., Schreiner C. (1997). Coding of naturalistic stimuli by auditory midbrain neurons. Advances in Neural Information Processing Systems, 10, 103–109.
1. Baker C. L., Mareschal I. (2001). Processing of second-order stimuli in the visual cortex. Progress in Brain Research, 134, 171–191. 10.1016/S0079-6123(01)34013-X - DOI - PubMed
1. Baumann S., Griffiths T. D., Sun L., Petkov C. I., Thiele A., Rees A. (2011). Orthogonal representation of sound dimensions in the primate midbrain. Nature Neuroscience, 14, 423–425. 10.1038/nn.2771 - DOI - PMC - PubMed
1. Bergen J. R., Adelson E. H. (1988). Early vision and texture perception. Nature, 333, 363–364. 10.1038/333363a0 - DOI - PubMed
1. Bergen J. R., Landy M. S. (1991). Computational modeling of visual texture segregation. Computational Models of Visual Processing, 17, 253–271.

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

A two-stage spectral model for sound texture perception: Synthesis and psychophysics

Affiliation

A two-stage spectral model for sound texture perception: Synthesis and psychophysics

Authors

Affiliation

Abstract

Conflict of interest statement

Figures

References

LinkOut - more resources

Full Text Sources