Soundgen: An open-source tool for synthesizing nonverbal vocalizations

Andrey Anikin¹

Affiliations

PMID: 30054898
PMCID: PMC6478631
DOI: 10.3758/s13428-018-1095-7

Soundgen: An open-source tool for synthesizing nonverbal vocalizations

Andrey Anikin. Behav Res Methods. 2019 Apr.

. 2019 Apr;51(2):778-792.

doi: 10.3758/s13428-018-1095-7.

Author

Andrey Anikin¹

Affiliation

¹ Division of Cognitive Science, Department of Philosophy, Lund University, Box 192, SE-221 00, Lund, Sweden. andrey.anikin@lucs.lu.se.

PMID: 30054898
PMCID: PMC6478631
DOI: 10.3758/s13428-018-1095-7

Abstract

Voice synthesis is a useful method for investigating the communicative role of different acoustic features. Although many text-to-speech systems are available, researchers of human nonverbal vocalizations and bioacousticians may profit from a dedicated simple tool for synthesizing and manipulating natural-sounding vocalizations. Soundgen ( https://CRAN.R-project.org/package=soundgen ) is an open-source R package that synthesizes nonverbal vocalizations based on meaningful acoustic parameters, which can be specified from the command line or in an interactive app. This tool was validated by comparing the perceived emotion, valence, arousal, and authenticity of 60 recorded human nonverbal vocalizations (screams, moans, laughs, and so on) and their approximate synthetic reproductions. Each synthetic sound was created by manually specifying only a small number of high-level control parameters, such as syllable length and a few anchors for the intonation contour. Nevertheless, the valence and arousal ratings of synthetic sounds were similar to those of the original recordings, and the authenticity ratings were comparable, maintaining parity with the originals for less complex vocalizations. Manipulating the precise acoustic characteristics of synthetic sounds may shed light on the salient predictors of emotion in the human voice. More generally, soundgen may prove useful for any studies that require precise control over the acoustic features of nonspeech sounds, including research on animal vocalizations and auditory perception.

Keywords: Animal vocalizations; Emotion; Formant synthesis; Nonverbal vocalizations; Open source; Parametric synthesis; Voice synthesis.

PubMed Disclaimer

Figures

**Fig. 1**
Graphical user interface for *soundgen*

**Fig. 2**
Ratings of 60 human and 60 synthetic nonlinguistic vocalizations. Violin plots show the distributions of individual ratings for each call type (the “overall” category is aggregated per stimulus), with individual stimuli marked by indices from 1 to 60. Solid points with error bars show fitted values per call type: the median of the posterior distribution with 95% CI. Contrasts between real and synthetic sounds per call type are shown as axis labels

**Fig. 3**
Forced choice classification of sounds in terms of their underlying emotion: Proportions of responses averaged per call type. Assuming that the synthetic versions are functionally equivalent to the original recordings, the two halves of the figure should be mirror images of each other. All bars over 12% high are labeled, to simplify reading the graph

**Fig. 4**
Pearson’s correlations between emotion vectors (counts of emotional labels applied to a particular sound) for real and synthetic vocalizations. Solid points mark the median for each call type, and violin plots show the distribution of values for individual stimuli, which are marked 1 to 60. The shaded area shows the correlation that would be expected by chance (median and 95% CI), which was estimated by permuting the dataset

See this image and copyright information in PMC

References

1. Anikin, A. (2018). The perceptual effects of manipulating nonlinear phenomena and source spectrum in human nonverbal vocalizations. Manuscript submitted for publication.
1. Anikin A, Bååth R, Persson T. Human non-linguistic vocal repertoire: Call types and their meaning. Journal of Nonverbal Behavior. 2018;42:53–80. doi: 10.1007/s10919-017-0267-y. - DOI - PMC - PubMed
1. Anikin, A., & Johansson, N. (2018). Implicit associations between individual properties of color and sound. Manuscript in preparation. - PMC - PubMed
1. Anikin A, Lima CF. Perceptual and acoustic differences between authentic and acted nonverbal emotional vocalizations. Quarterly Journal of Experimental Psychology. 2018;71:622–641. - PubMed
1. Anikin A, Persson T. Non-linguistic vocalizations from online amateur videos for emotion research: A validated corpus. Behavior Research Methods. 2017;49:758–771. doi: 10.3758/s13428-016-0736-y. - DOI - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Soundgen: An open-source tool for synthesizing nonverbal vocalizations

Affiliation

Soundgen: An open-source tool for synthesizing nonverbal vocalizations

Author

Affiliation

Abstract

Figures

References

MeSH terms

LinkOut - more resources

Full Text Sources

Other Literature Sources