. 2024 Sep;56(6):5693-5708.

doi: 10.3758/s13428-023-02300-4. Epub 2023 Dec 13.

Ecologically valid speech collection in behavioral research: The Ghent Semi-spontaneous Speech Paradigm (GSSP)

Jonas Van Der Donckt^#^{1

2}, Mitchel Kappen^#^{3

4}, Vic Degraeve^{5

6}, Kris Demuynck^{5

6}, Marie-Anne Vanderhasselt^{7

8

9}, Sofie Van Hoecke^{5

6}

Affiliations

¹ IDLab, Ghent University - imec, Technologiepark Zwijnaarde 122, 9052, Ghent, Zwijnaarde, Belgium. Jonvdrdo.Vanderdonckt@UGent.be.
² Department of Electronics and Information Systems, Ghent University, Ghent, Belgium. Jonvdrdo.Vanderdonckt@UGent.be.
³ Department of Head and Skin, Ghent University, University Hospital Ghent (UZ Ghent), Department of Psychiatry and Medical Psychology, Corneel Heymanslaan 10, 9000, Gent, Belgium. Mitchel.Kappen@UGent.be.
⁴ Ghent Experimental Psychiatry (GHEP) Lab, Ghent University, Ghent, Belgium. Mitchel.Kappen@UGent.be.
⁵ IDLab, Ghent University - imec, Technologiepark Zwijnaarde 122, 9052, Ghent, Zwijnaarde, Belgium.
⁶ Department of Electronics and Information Systems, Ghent University, Ghent, Belgium.
⁷ Department of Head and Skin, Ghent University, University Hospital Ghent (UZ Ghent), Department of Psychiatry and Medical Psychology, Corneel Heymanslaan 10, 9000, Gent, Belgium.
⁸ Ghent Experimental Psychiatry (GHEP) Lab, Ghent University, Ghent, Belgium.
⁹ Department of Experimental Clinical and Health Psychology, Ghent University, Ghent, Belgium.

^# Contributed equally.

PMID: 38091208
PMCID: PMC11335842
DOI: 10.3758/s13428-023-02300-4

Ecologically valid speech collection in behavioral research: The Ghent Semi-spontaneous Speech Paradigm (GSSP)

Jonas Van Der Donckt et al. Behav Res Methods. 2024 Sep.

. 2024 Sep;56(6):5693-5708.

doi: 10.3758/s13428-023-02300-4. Epub 2023 Dec 13.

Authors

Jonas Van Der Donckt^#^{1

2}, Mitchel Kappen^#^{3

4}, Vic Degraeve^{5

6}, Kris Demuynck^{5

6}, Marie-Anne Vanderhasselt^{7

8

9}, Sofie Van Hoecke^{5

6}

Affiliations

¹ IDLab, Ghent University - imec, Technologiepark Zwijnaarde 122, 9052, Ghent, Zwijnaarde, Belgium. Jonvdrdo.Vanderdonckt@UGent.be.
² Department of Electronics and Information Systems, Ghent University, Ghent, Belgium. Jonvdrdo.Vanderdonckt@UGent.be.
³ Department of Head and Skin, Ghent University, University Hospital Ghent (UZ Ghent), Department of Psychiatry and Medical Psychology, Corneel Heymanslaan 10, 9000, Gent, Belgium. Mitchel.Kappen@UGent.be.
⁴ Ghent Experimental Psychiatry (GHEP) Lab, Ghent University, Ghent, Belgium. Mitchel.Kappen@UGent.be.
⁵ IDLab, Ghent University - imec, Technologiepark Zwijnaarde 122, 9052, Ghent, Zwijnaarde, Belgium.
⁶ Department of Electronics and Information Systems, Ghent University, Ghent, Belgium.
⁷ Department of Head and Skin, Ghent University, University Hospital Ghent (UZ Ghent), Department of Psychiatry and Medical Psychology, Corneel Heymanslaan 10, 9000, Gent, Belgium.
⁸ Ghent Experimental Psychiatry (GHEP) Lab, Ghent University, Ghent, Belgium.
⁹ Department of Experimental Clinical and Health Psychology, Ghent University, Ghent, Belgium.

^# Contributed equally.

PMID: 38091208
PMCID: PMC11335842
DOI: 10.3758/s13428-023-02300-4

Abstract

This paper introduces the Ghent Semi-spontaneous Speech Paradigm (GSSP), a new method for collecting unscripted speech data for affective-behavioral research in both experimental and real-world settings through the description of peer-rated pictures with a consistent affective load. The GSSP was designed to meet five criteria: (1) allow flexible speech recording durations, (2) provide a straightforward and non-interfering task, (3) allow for experimental control, (4) favor spontaneous speech for its prosodic richness, and (5) require minimal human interference to enable scalability. The validity of the GSSP was evaluated through an online task, in which this paradigm was implemented alongside a fixed-text read-aloud task. The results indicate that participants were able to describe images with an adequate duration, and acoustic analysis demonstrated a trend for most features in line with the targeted speech styles (i.e., unscripted spontaneous speech versus scripted read-aloud speech). A speech style classification model using acoustic features achieved a balanced accuracy of 83% on within-dataset validation, indicating separability between the GSSP and read-aloud speech task. Furthermore, when validating this model on an external dataset that contains interview and read-aloud speech, a balanced accuracy score of 70% is obtained, indicating an acoustic correspondence between the GSSP speech and spontaneous interviewee speech. The GSSP is of special interest for behavioral and speech researchers looking to capture spontaneous speech, both in longitudinal ambulatory behavioral studies and laboratory studies. To facilitate future research on speech styles, acoustics, and affective states, the task implementation code, the collected dataset, and analysis notebooks are available.

Keywords: Acoustics; Behavioral research; Experimental research; Machine learning; Psycholinguistics; Speech; Speech collection; Speech styles.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

**Fig. 1**
Flowchart of the web application experiment. *Note*. This results in 7 Marloes, 15 Radboud, and 15 PiSCES utterances per participant

**Fig. 2**
Trial flow chart of the web app speech collection task, with the pages translated to English. First, an empty page (a) is displayed with an enabled start button and a disabled stop button. When the participant clicks the start button, (b) the audio recording begins, the stop button will be enabled. The stimulus in the form of an image (or text for the read-aloud task) is being presented. After the participant completes the stimulus speech collection task, he/she or they click on the stop button, triggering the redirection to (c), where the participant reports their experienced arousal and valence values

**Fig. 3**
Audio data processing flowchart

**Fig. 4**
VAD slicing with a 0.25 s margin for the first and last voiced segment. *Note.* The first voiced regions occur approximately 2 seconds after the participant pressed the “start” button. The slicing ensures that each participant's first/last voiced segment start/end at the same time, allowing to make fair comparisons on fixed-duration excerpts relative from the VAD-slice beginning or end

**Fig. 5**
Distribution plot of the VAD-sliced utterance durations. The vertical dashed lines on the left indicate the voiced duration threshold (15 seconds) and the lines on the right represent the instructed image description duration (30 seconds)

**Fig. 6**
Box plot of temporal features, grouped by collection task (row 1) and speech style (row 2)

**Fig. 7**
Box plot of frequency-related features, grouped by task (row 1) and speech style (row 2)

**Fig. 8**
Box plot of amplitude-related features, grouped by task (row 1) and speech style (row 2)

**Fig. 9**
Picture delta box plot of a subset of openSmile features for both the PiSCES (column 1) and Radboud (column 2) image sets. The deltas are calculated by subtracting each value from the participant’s mean for the same DB set

**Fig. 10**
Two-dimensional t-SNE projection of ECAPA-TDNN utterance embeddings. (a) Hue determined by speaker ID. (b) Hue determined by speech style. *Note.* Each marker represents one speech utterance and, as illustrated by (a), each cluster of markers represents utterances by one speaker. When visualizing the colors of each dot based on its speech (trial) style (b), we see that generally the individual speech styles cluster together within each speaker's utterances. This hints towards a separability of speech styles based on speaker identification techniques using acoustic properties

See this image and copyright information in PMC

References

1. Baird, A., Amiriparian, S., Cummins, N., Sturmbauer, S., Janson, J., Messner, E.-M., Baumeister, H., Rohleder, N., & Schuller, B. W. (2019). Using Speech to Predict Sequentially Measured Cortisol Levels During a Trier Social Stress Test. Interspeech 2019, 534–538. 10.21437/Interspeech.2019-1352
1. Baird, A., Triantafyllopoulos, A., Zänkert, S., Ottl, S., Christ, L., Stappen, L., Konzok, J., Sturmbauer, S., Meßner, E.-M., Kudielka, B. M., Rohleder, N., Baumeister, H., & Schuller, B. W. (2021). An Evaluation of Speech-Based Recognition of Emotional and Physiological Markers of Stress. Frontiers in Computer Science, 3, 750284. 10.3389/fcomp.2021.75028410.3389/fcomp.2021.750284 - DOI
1. Barik, H. C. (1977). Cross-Linguistic Study of Temporal Characteristics of Different Types of Speech Materials. Language and Speech, 20(2), 116–126. 10.1177/002383097702000203 10.1177/002383097702000203 - DOI - PubMed
1. Batliner, A., Kompe, R., Kießling, A., Nöth, E., & Niemann, H. (1995). Can You Tell Apart Spontaneous and Read Speech if You Just Look at Prosody? In A. J. R. Ayuso & J. M. L. Soler (Eds.), Speech Recognition and Coding (pp. 321–324). Springer. 10.1007/978-3-642-57745-1_47
1. Blaauw, Eleneora. (1992). Phonetic differences between read and spontaneous speech. Accessed May 2023, https://www.isca-speech.org/archive_v0/archive_papers/icslp_1992/i92_075...

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
- PubMed Central
- Springer

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Ecologically valid speech collection in behavioral research: The Ghent Semi-spontaneous Speech Paradigm (GSSP)

Affiliations

Ecologically valid speech collection in behavioral research: The Ghent Semi-spontaneous Speech Paradigm (GSSP)

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources