A tool for efficient and accurate segmentation of speech data: announcing POnSS

Joe Rodd^{1

2}, Caitlin Decuyper³, Hans Rutger Bosker³, Louis Ten Bosch⁴

Affiliations

¹ Max Planck Institute for Psycholinguistics, Radboud University, Postbus 310 6500AH, Nijmegen, The Netherlands. joe.rodd@mpi.nl.
² Centre for Language Studies, Radboud University, PO Box 9103, 6500 HD, Nijmegen, The Netherlands. joe.rodd@mpi.nl.
³ Max Planck Institute for Psycholinguistics, Radboud University, Postbus 310 6500AH, Nijmegen, The Netherlands.
⁴ Centre for Language Studies, Radboud University, PO Box 9103, 6500 HD, Nijmegen, The Netherlands.

PMID: 32869139
PMCID: PMC8062395
DOI: 10.3758/s13428-020-01449-6

A tool for efficient and accurate segmentation of speech data: announcing POnSS

Joe Rodd et al. Behav Res Methods. 2021 Apr.

. 2021 Apr;53(2):744-756.

doi: 10.3758/s13428-020-01449-6.

Authors

Joe Rodd^{1

2}, Caitlin Decuyper³, Hans Rutger Bosker³, Louis Ten Bosch⁴

Affiliations

¹ Max Planck Institute for Psycholinguistics, Radboud University, Postbus 310 6500AH, Nijmegen, The Netherlands. joe.rodd@mpi.nl.
² Centre for Language Studies, Radboud University, PO Box 9103, 6500 HD, Nijmegen, The Netherlands. joe.rodd@mpi.nl.
³ Max Planck Institute for Psycholinguistics, Radboud University, Postbus 310 6500AH, Nijmegen, The Netherlands.
⁴ Centre for Language Studies, Radboud University, PO Box 9103, 6500 HD, Nijmegen, The Netherlands.

PMID: 32869139
PMCID: PMC8062395
DOI: 10.3758/s13428-020-01449-6

Abstract

Despite advances in automatic speech recognition (ASR), human input is still essential for producing research-grade segmentations of speech data. Conventional approaches to manual segmentation are very labor-intensive. We introduce POnSS, a browser-based system that is specialized for the task of segmenting the onsets and offsets of words, which combines aspects of ASR with limited human input. In developing POnSS, we identified several sub-tasks of segmentation, and implemented each of these as separate interfaces for the annotators to interact with to streamline their task as much as possible. We evaluated segmentations made with POnSS against a baseline of segmentations of the same data made conventionally in Praat. We observed that POnSS achieved comparable reliability to segmentation using Praat, but required 23% less annotator time investment. Because of its greater efficiency without sacrificing reliability, POnSS represents a distinct methodological advance for the segmentation of speech data.

Keywords: Segmentation; Speech data.

PubMed Disclaimer

Figures

**Fig. 1**
A diagrammatic representation of the annotation process. See the text for full details

**Fig. 2**
Screenshots of the browser interfaces for the orthographic transcription (*left*), triage (*middle*), and retrimming tasks (*right*) in POnSS

**Fig. 3**
Panel a: the observed distributions of the difference between segmented times and the median segmentation for each word, for POnSS and manual annotation modalities (*colors*). Panel b: an example of the optimized mixture-model fit (*orange*) to the observed distribution of one of the samples (*black line*). Panel c: *Solid violins* show the posteriors of Model 1 (see text) for the effect of modality on the sigma, with median (*points*), 95% HDIs (highest density intervals, *thin black lines*) and 66% HDIs (*thick black lines*)

**Fig. 4**
Distributions of bootstrap-resampled estimates of how many annotator hours it would take to yield 5000 well-segmented words by the two modalities (*translucent violins*). Overlaid are *solid violins* showing the posteriors of Model 2 for the effect of modality, with median (*points*), 95% and 66% HDIs are too narrow to see in the figure

See this image and copyright information in PMC

References

1. Bartko JJ, Carpenter WT. On the methods and theory of reliability. The Journal of Nervous and Mental Disease. 1976;163(5):307. doi: 10.1097/00005053-197611000-00003. - DOI - PubMed
1. Bartko JJ. The intraclass correlation coefficient as a measure of reliability. Psychological Reports. 1966;19(1):3–11. doi: 10.2466/pr0.1966.19.1.3. - DOI - PubMed
1. Bhati S, Nayak S, Murty KSR, Dehak N. Unsupervised acoustic segmentation and clustering using Siamese network embeddings. Proc. Interspeech. 2019;2019:2668–2672. doi: 10.21437/Interspeech.2019-2981. - DOI
1. Bigi B, Meunier C. Automatic segmentation of spontaneous speech. Revista de Estudos da Linguagem. 2018;26(4):1489–1530. doi: 10.17851/2237-2083.26.4.1489-1530. - DOI
1. Boersma P, Weenink D. Praat: Doing phonetics by computer [computer program] Version 6.1.08. Amsterdam: University of Amsterdam. Retrieved from; 2019.

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

A tool for efficient and accurate segmentation of speech data: announcing POnSS

Affiliations

A tool for efficient and accurate segmentation of speech data: announcing POnSS

Authors

Affiliations

Abstract

Figures

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources