Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Feb 22;14(1):20416695231157349.
doi: 10.1177/20416695231157349. eCollection 2023 Jan-Feb.

A two-stage spectral model for sound texture perception: Synthesis and psychophysics

Affiliations

A two-stage spectral model for sound texture perception: Synthesis and psychophysics

Hironori Maruyama et al. Iperception. .

Abstract

The natural environment is filled with a variety of auditory events such as wind blowing, water flowing, and fire crackling. It has been suggested that the perception of such textural sounds is based on the statistics of the natural auditory events. Inspired by a recent spectral model for visual texture perception, we propose a model that can describe the perceived sound texture only with the linear spectrum and the energy spectrum. We tested the validity of the model by using synthetic noise sounds that preserve the two-stage amplitude spectra of the original sound. Psychophysical experiment showed that our synthetic noises were perceived as like the original sounds for 120 real-world auditory events. The performance was comparable with the synthetic sounds produced by McDermott-Simoncelli's model which considers various classes of auditory statistics. The results support the notion that the perception of natural sound textures is predictable by the two-stage spectral signals.

Keywords: listening; models; texture; visuo-auditory interactions.

PubMed Disclaimer

Conflict of interest statement

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Figures

Figure 1.
Figure 1.
Two-stage spectral representation of visual and auditory textures. (a) Two-stage spectral representation of visual texture (2D spectrum of luminance and 4D spectrum of subband energy). (b) Two-stage spectral representation of auditory texture (1D spectrum of sound waves and 2D spectrum of cochlear subband envelopes). See text for details.
Figure 2.
Figure 2.
Schematic diagram of synthesized sound with two-stage spectral representation. PR denotes phase-randomized sound. le-PR denotes a phase-randomized sound that preserves the two-stage spectra. The original sound, PR, and le-PR are represented by waveforms. See text for details.
Figure 3.
Figure 3.
(a) Example spectrograms of the original sounds and three types of synthesized sounds. From top to bottom, the results are shown for the original sounds, le-PR, PR, and the MS-synthesized sound, respectively. Each column shows the name of the original natural sounds. (b) Examples of le-PR spectrograms with different random phase spectra, respectively.
Figure 4.
Figure 4.
The perceptual similarity rating for the three types of synthesized sounds to the original natural sound: the MS statistics, the two-stage spectrum (le-PR), and the linear spectrum (PR). (a) Joint histograms of similarity ratings between different types of synthetic sounds. Each panel shows the comparison in ratings for le-PR versus PR (left), MS versus PR (middle), and MS versus le-PR (right). (b) Similarity ratings averaged across 120 natural sounds. Error bars represent ±1 SEM between participants.

Similar articles

Cited by

References

    1. Attias H., Schreiner C. (1997). Coding of naturalistic stimuli by auditory midbrain neurons. Advances in Neural Information Processing Systems, 10, 103–109.
    1. Baker C. L., Mareschal I. (2001). Processing of second-order stimuli in the visual cortex. Progress in Brain Research, 134, 171–191. 10.1016/S0079-6123(01)34013-X - DOI - PubMed
    1. Baumann S., Griffiths T. D., Sun L., Petkov C. I., Thiele A., Rees A. (2011). Orthogonal representation of sound dimensions in the primate midbrain. Nature Neuroscience, 14, 423–425. 10.1038/nn.2771 - DOI - PMC - PubMed
    1. Bergen J. R., Adelson E. H. (1988). Early vision and texture perception. Nature, 333, 363–364. 10.1038/333363a0 - DOI - PubMed
    1. Bergen J. R., Landy M. S. (1991). Computational modeling of visual texture segregation. Computational Models of Visual Processing, 17, 253–271.

LinkOut - more resources