Using automatic alignment to analyze endangered language data: testing the viability of untrained alignment

Christian DiCanio¹, Hosung Nam, Douglas H Whalen, H Timothy Bunnell, Jonathan D Amith, Rey Castillo García

Affiliations

PMID: 23967953
PMCID: PMC5392066
DOI: 10.1121/1.4816491

Comparative Study

Using automatic alignment to analyze endangered language data: testing the viability of untrained alignment

Christian DiCanio et al. J Acoust Soc Am. 2013 Sep.

. 2013 Sep;134(3):2235-46.

doi: 10.1121/1.4816491.

Authors

Christian DiCanio¹, Hosung Nam, Douglas H Whalen, H Timothy Bunnell, Jonathan D Amith, Rey Castillo García

Affiliation

¹ Haskins Laboratories, 300 George Street, New Haven, Connecticut 06511, USA. dicanio@haskins.yale.edu

PMID: 23967953
PMCID: PMC5392066
DOI: 10.1121/1.4816491

Abstract

While efforts to document endangered languages have steadily increased, the phonetic analysis of endangered language data remains a challenge. The transcription of large documentation corpora is, by itself, a tremendous feat. Yet, the process of segmentation remains a bottleneck for research with data of this kind. This paper examines whether a speech processing tool, forced alignment, can facilitate the segmentation task for small data sets, even when the target language differs from the training language. The authors also examined whether a phone set with contextualization outperforms a more general one. The accuracy of two forced aligners trained on English (hmalign and p2fa) was assessed using corpus data from Yoloxóchitl Mixtec. Overall, agreement performance was relatively good, with accuracy at 70.9% within 30 ms for hmalign and 65.7% within 30 ms for p2fa. Segmental and tonal categories influenced accuracy as well. For instance, additional stop allophones in hmalign's phone set aided alignment accuracy. Agreement differences between aligners also corresponded closely with the types of data on which the aligners were trained. Overall, using existing alignment systems was found to have potential for making phonetic analysis of small corpora more efficient, with more allophonic phone sets providing better agreement than general ones.

PubMed Disclaimer

Figures

**FIG. 1.**
Agreement for consonants and vowels across aligners.

**FIG. 2.**
Agreement for consonant classes across aligners.

**FIG. 3.**
Agreement for vowel qualities across aligners.

**FIG. 4.**
Agreement for vowel nasalization across aligners.

**FIG. 5.**
Agreement for stops across aligners.

**FIG. 6.**
Agreement for glottal stops across aligners.

See this image and copyright information in PMC

References

1. Adda-Decker, M., and Snoeren, N. D. (2011). “ Quantifying temporal speech reduction in French using forced speech alignment,” J. Phonetics 39, 261–270. 10.1016/j.wocn.2010.11.011 - DOI
1. Baayen, R. H. (2008). Analyzing Linguistic Data: A Practical Introduction to Statistics Using R (Cambridge University Press, Cambridge, UK: ).
1. Badenhorst, J., van Heerden, C., Davel, M., and Barnard, E. (2011). “ Collecting and evaluating speech recognition corpora for 11 South African languages,” Lang. Res. Eval. 45(3 ), 289–309. 10.1007/s10579-011-9152-1 - DOI
1. Bates, D. M. (2005). “ Fitting linear mixed models in R,” R News 5, 27–30.
1. Beddor, P. S. , and Krakow, R. A. (1999). “ Perception of coarticulatory nasalization by speakers of English and Thai: Evidence for partial compensation,” J. Acoust. Soc. Am. 106(5 ), 2868–2887. 10.1121/1.428111 - DOI - PubMed

Publication types

Actions
Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

R42 DC006193/DC/NIDCD NIH HHS/United States

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Using automatic alignment to analyze endangered language data: testing the viability of untrained alignment

Affiliation

Using automatic alignment to analyze endangered language data: testing the viability of untrained alignment

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources