Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Apr;51(2):778-792.
doi: 10.3758/s13428-018-1095-7.

Soundgen: An open-source tool for synthesizing nonverbal vocalizations

Affiliations

Soundgen: An open-source tool for synthesizing nonverbal vocalizations

Andrey Anikin. Behav Res Methods. 2019 Apr.

Abstract

Voice synthesis is a useful method for investigating the communicative role of different acoustic features. Although many text-to-speech systems are available, researchers of human nonverbal vocalizations and bioacousticians may profit from a dedicated simple tool for synthesizing and manipulating natural-sounding vocalizations. Soundgen ( https://CRAN.R-project.org/package=soundgen ) is an open-source R package that synthesizes nonverbal vocalizations based on meaningful acoustic parameters, which can be specified from the command line or in an interactive app. This tool was validated by comparing the perceived emotion, valence, arousal, and authenticity of 60 recorded human nonverbal vocalizations (screams, moans, laughs, and so on) and their approximate synthetic reproductions. Each synthetic sound was created by manually specifying only a small number of high-level control parameters, such as syllable length and a few anchors for the intonation contour. Nevertheless, the valence and arousal ratings of synthetic sounds were similar to those of the original recordings, and the authenticity ratings were comparable, maintaining parity with the originals for less complex vocalizations. Manipulating the precise acoustic characteristics of synthetic sounds may shed light on the salient predictors of emotion in the human voice. More generally, soundgen may prove useful for any studies that require precise control over the acoustic features of nonspeech sounds, including research on animal vocalizations and auditory perception.

Keywords: Animal vocalizations; Emotion; Formant synthesis; Nonverbal vocalizations; Open source; Parametric synthesis; Voice synthesis.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Graphical user interface for soundgen
Fig. 2
Fig. 2
Ratings of 60 human and 60 synthetic nonlinguistic vocalizations. Violin plots show the distributions of individual ratings for each call type (the “overall” category is aggregated per stimulus), with individual stimuli marked by indices from 1 to 60. Solid points with error bars show fitted values per call type: the median of the posterior distribution with 95% CI. Contrasts between real and synthetic sounds per call type are shown as axis labels
Fig. 3
Fig. 3
Forced choice classification of sounds in terms of their underlying emotion: Proportions of responses averaged per call type. Assuming that the synthetic versions are functionally equivalent to the original recordings, the two halves of the figure should be mirror images of each other. All bars over 12% high are labeled, to simplify reading the graph
Fig. 4
Fig. 4
Pearson’s correlations between emotion vectors (counts of emotional labels applied to a particular sound) for real and synthetic vocalizations. Solid points mark the median for each call type, and violin plots show the distribution of values for individual stimuli, which are marked 1 to 60. The shaded area shows the correlation that would be expected by chance (median and 95% CI), which was estimated by permuting the dataset

References

    1. Anikin, A. (2018). The perceptual effects of manipulating nonlinear phenomena and source spectrum in human nonverbal vocalizations. Manuscript submitted for publication.
    1. Anikin A, Bååth R, Persson T. Human non-linguistic vocal repertoire: Call types and their meaning. Journal of Nonverbal Behavior. 2018;42:53–80. doi: 10.1007/s10919-017-0267-y. - DOI - PMC - PubMed
    1. Anikin, A., & Johansson, N. (2018). Implicit associations between individual properties of color and sound. Manuscript in preparation. - PMC - PubMed
    1. Anikin A, Lima CF. Perceptual and acoustic differences between authentic and acted nonverbal emotional vocalizations. Quarterly Journal of Experimental Psychology. 2018;71:622–641. - PubMed
    1. Anikin A, Persson T. Non-linguistic vocalizations from online amateur videos for emotion research: A validated corpus. Behavior Research Methods. 2017;49:758–771. doi: 10.3758/s13428-016-0736-y. - DOI - PubMed

LinkOut - more resources