Understanding the Phonetic Characteristics of Speech Under Uncertainty-Implications of the Representation of Linguistic Knowledge in Learning and Processing

Fabian Tomaschek¹, Michael Ramscar¹

Affiliations

PMID: 35548492
PMCID: PMC9083257
DOI: 10.3389/fpsyg.2022.754395

Understanding the Phonetic Characteristics of Speech Under Uncertainty-Implications of the Representation of Linguistic Knowledge in Learning and Processing

Fabian Tomaschek et al. Front Psychol. 2022.

. 2022 Apr 25:13:754395.

doi: 10.3389/fpsyg.2022.754395. eCollection 2022.

Authors

Fabian Tomaschek¹, Michael Ramscar¹

Affiliation

¹ Quantitative Linguistics Lab, Department of General Linguistics, University of Tübingen, Tübingen, Germany.

PMID: 35548492
PMCID: PMC9083257
DOI: 10.3389/fpsyg.2022.754395

Abstract

The uncertainty associated with paradigmatic families has been shown to correlate with their phonetic characteristics in speech, suggesting that representations of complex sublexical relations between words are part of speaker knowledge. To better understand this, recent studies have used two-layer neural network models to examine the way paradigmatic uncertainty emerges in learning. However, to date this work has largely ignored the way choices about the representation of inflectional and grammatical functions (IFS) in models strongly influence what they subsequently learn. To explore the consequences of this, we investigate how representations of IFS in the input-output structures of learning models affect the capacity of uncertainty estimates derived from them to account for phonetic variability in speech. Specifically, we examine whether IFS are best represented as outputs to neural networks (as in previous studies) or as inputs by building models that embody both choices and examining their capacity to account for uncertainty effects in the formant trajectories of word final [ɐ], which in German discriminates around sixty different IFS. Overall, we find that formants are enhanced as the uncertainty associated with IFS decreases. This result dovetails with a growing number of studies of morphological and inflectional families that have shown that enhancement is associated with lower uncertainty in context. Importantly, we also find that in models where IFS serve as inputs-as our theoretical analysis suggests they ought to-its uncertainty measures provide better fits to the empirical variance observed in [ɐ] formants than models where IFS serve as outputs. This supports our suggestion that IFS serve as cognitive cues during speech production, and should be treated as such in modeling. It is also consistent with the idea that when IFS serve as inputs to a learning network. This maintains the distinction between those parts of the network that represent message and those that represent signal. We conclude by describing how maintaining a "signal-message-uncertainty distinction" can allow us to reconcile a range of apparently contradictory findings about the relationship between articulation and uncertainty in context.

Keywords: context; cue-to-outcome structure; discriminative learning; enhancement; linguistic knowledge; morphological structure; phonetic characteristics; reduction.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

**Figure 1**
The possible predictive relationships labels (in morphological terms, series of words and affixes) can enter into with the other features of the world (or other elements of a code). A feature-to-label relationship **(A)** will facilitate cue competition between features, and the abstraction of the informative dimensions that predict morphological contrasts (e.g., nouns and plural affixes) in learning. By contrast, a label-to-feature relationship **(B)** will be constrained to simply learning the probability of each feature given the label.

**Figure 2**
ML-score difference between model m0 and models m1 to m4. The larger the difference, the better the model's goodness of fit.

**Figure 3**
Estimated trajectories for different word classes (columns) in relation to vowel duration **(top)**, functional output activation obtained from a network with inflectional functions of [ɐ] in the output **(middle)** and functional input activation obtained from a network with inflectional functions of [ɐ] in the input **(bottom)**. The x-axes represent inverted z-scaled F2 frequencies such that the left edge points toward the front of the vowel space and the right edge points toward the back of the vowel space. Y-axes represent inverted z-scaled F1 frequencies such that the top points to the top of the vowels space and the bottom points toward the bottom of the vowel space. Shades of red represent percentiles for different predictors (optimized for color blindness). Onset of the time course is located at the filled star, the circle in the trajectory represents the center of the vowel.

See this image and copyright information in PMC

References

1. Arnold D., Tomaschek F. (2016). “The Karl Eberhards Corpus of spontaneously spoken Southern German in dialogues - audio and articulatory recordings,” in Tagungsband der 12. Tagung Phonetik und Phonologie im deutschsprachigen Raum, eds C. Draxler and F. Kleber (München: Ludwig-Maximilians-Universität München; ), 9–11.
1. Arnon I., Ramscar M. (2012). Granularity and the acquisition of grammatical gender: How order-of-acquisition affects what gets learned. Cognition 122, 292–305. 10.1016/j.cognition.2011.10.009 - DOI - PubMed
1. Arppe A., Hendrix P., Milin P., Baayen R. H., Sering T., Shaoul C. (2018). ndl: Naive Discriminative Learning. Available online at: https://CRAN.R-project.org/package=ndl
1. Aylett M., Turk A. (2004). The Smooth Signal Redundancy Hypothesis: a functional explanation for relationships between redundancy, prosodic prominence, and duration in spontaneous speech. Lang. Speech 47, 31–56. 10.1177/00238309040470010201 - DOI - PubMed
1. Aylett M., Turk A. (2006). Language redundancy predicts syllabic duration and the spectral characteristics of vocalic syllable nuclei. J. Acoust. Soc. Am. 119, 3048–3058. 10.1121/1.2188331 - DOI - PubMed

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Understanding the Phonetic Characteristics of Speech Under Uncertainty-Implications of the Representation of Linguistic Knowledge in Learning and Processing

Affiliation

Understanding the Phonetic Characteristics of Speech Under Uncertainty-Implications of the Representation of Linguistic Knowledge in Learning and Processing

Authors

Affiliation

Abstract

Conflict of interest statement

Figures

Similar articles

References

LinkOut - more resources

Full Text Sources

Abstract

Conflict of interest statement

Figures

Similar articles

References

Related information

LinkOut - more resources

Full Text Sources