Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Apr 25:13:754395.
doi: 10.3389/fpsyg.2022.754395. eCollection 2022.

Understanding the Phonetic Characteristics of Speech Under Uncertainty-Implications of the Representation of Linguistic Knowledge in Learning and Processing

Affiliations

Understanding the Phonetic Characteristics of Speech Under Uncertainty-Implications of the Representation of Linguistic Knowledge in Learning and Processing

Fabian Tomaschek et al. Front Psychol. .

Abstract

The uncertainty associated with paradigmatic families has been shown to correlate with their phonetic characteristics in speech, suggesting that representations of complex sublexical relations between words are part of speaker knowledge. To better understand this, recent studies have used two-layer neural network models to examine the way paradigmatic uncertainty emerges in learning. However, to date this work has largely ignored the way choices about the representation of inflectional and grammatical functions (IFS) in models strongly influence what they subsequently learn. To explore the consequences of this, we investigate how representations of IFS in the input-output structures of learning models affect the capacity of uncertainty estimates derived from them to account for phonetic variability in speech. Specifically, we examine whether IFS are best represented as outputs to neural networks (as in previous studies) or as inputs by building models that embody both choices and examining their capacity to account for uncertainty effects in the formant trajectories of word final [ɐ], which in German discriminates around sixty different IFS. Overall, we find that formants are enhanced as the uncertainty associated with IFS decreases. This result dovetails with a growing number of studies of morphological and inflectional families that have shown that enhancement is associated with lower uncertainty in context. Importantly, we also find that in models where IFS serve as inputs-as our theoretical analysis suggests they ought to-its uncertainty measures provide better fits to the empirical variance observed in [ɐ] formants than models where IFS serve as outputs. This supports our suggestion that IFS serve as cognitive cues during speech production, and should be treated as such in modeling. It is also consistent with the idea that when IFS serve as inputs to a learning network. This maintains the distinction between those parts of the network that represent message and those that represent signal. We conclude by describing how maintaining a "signal-message-uncertainty distinction" can allow us to reconcile a range of apparently contradictory findings about the relationship between articulation and uncertainty in context.

Keywords: context; cue-to-outcome structure; discriminative learning; enhancement; linguistic knowledge; morphological structure; phonetic characteristics; reduction.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

Figure 1
Figure 1
The possible predictive relationships labels (in morphological terms, series of words and affixes) can enter into with the other features of the world (or other elements of a code). A feature-to-label relationship (A) will facilitate cue competition between features, and the abstraction of the informative dimensions that predict morphological contrasts (e.g., nouns and plural affixes) in learning. By contrast, a label-to-feature relationship (B) will be constrained to simply learning the probability of each feature given the label.
Figure 2
Figure 2
ML-score difference between model m0 and models m1 to m4. The larger the difference, the better the model's goodness of fit.
Figure 3
Figure 3
Estimated trajectories for different word classes (columns) in relation to vowel duration (top), functional output activation obtained from a network with inflectional functions of [ɐ] in the output (middle) and functional input activation obtained from a network with inflectional functions of [ɐ] in the input (bottom). The x-axes represent inverted z-scaled F2 frequencies such that the left edge points toward the front of the vowel space and the right edge points toward the back of the vowel space. Y-axes represent inverted z-scaled F1 frequencies such that the top points to the top of the vowels space and the bottom points toward the bottom of the vowel space. Shades of red represent percentiles for different predictors (optimized for color blindness). Onset of the time course is located at the filled star, the circle in the trajectory represents the center of the vowel.

Similar articles

References

    1. Arnold D., Tomaschek F. (2016). “The Karl Eberhards Corpus of spontaneously spoken Southern German in dialogues - audio and articulatory recordings,” in Tagungsband der 12. Tagung Phonetik und Phonologie im deutschsprachigen Raum, eds C. Draxler and F. Kleber (München: Ludwig-Maximilians-Universität München; ), 9–11.
    1. Arnon I., Ramscar M. (2012). Granularity and the acquisition of grammatical gender: How order-of-acquisition affects what gets learned. Cognition 122, 292–305. 10.1016/j.cognition.2011.10.009 - DOI - PubMed
    1. Arppe A., Hendrix P., Milin P., Baayen R. H., Sering T., Shaoul C. (2018). ndl: Naive Discriminative Learning. Available online at: https://CRAN.R-project.org/package=ndl
    1. Aylett M., Turk A. (2004). The Smooth Signal Redundancy Hypothesis: a functional explanation for relationships between redundancy, prosodic prominence, and duration in spontaneous speech. Lang. Speech 47, 31–56. 10.1177/00238309040470010201 - DOI - PubMed
    1. Aylett M., Turk A. (2006). Language redundancy predicts syllabic duration and the spectral characteristics of vocalic syllable nuclei. J. Acoust. Soc. Am. 119, 3048–3058. 10.1121/1.2188331 - DOI - PubMed

LinkOut - more resources