Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Dec 28;13(12):e0209449.
doi: 10.1371/journal.pone.0209449. eCollection 2018.

Lexical category acquisition is facilitated by uncertainty in distributional co-occurrences

Affiliations

Lexical category acquisition is facilitated by uncertainty in distributional co-occurrences

Giovanni Cassani et al. PLoS One. .

Erratum in

Abstract

This paper analyzes distributional properties that facilitate the categorization of words into lexical categories. First, word-context co-occurrence counts were collected using corpora of transcribed English child-directed speech. Then, an unsupervised k-nearest neighbor algorithm was used to categorize words into lexical categories. The categorization outcome was regressed over three main distributional predictors computed for each word, including frequency, contextual diversity, and average conditional probability given all the co-occurring contexts. Results show that both contextual diversity and frequency have a positive effect while the average conditional probability has a negative effect. This indicates that words are easier to categorize in the face of uncertainty: categorization works best for words which are frequent, diverse, and hard to predict given the co-occurring contexts. This shows how, in order for the learner to see an opportunity to form a category, there needs to be a certain degree of uncertainty in the co-occurrence pattern.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Logistic mixed-effect models for single predictors—Cosine distance.
Effects of the logistic mixed-effect models which included a single predictor (inputted in the model together with Time, which is part of the baseline model) using categorization accuracy achieved with cosine as distance metric as the dependent variable. Subplots are ordered according to the reduction in AIC caused by adding the predictor to the baseline model: the most important predictor is in the top-left panel, then top-right, bottom-left, and finally bottom-right. 95% confidence bands are shown.
Fig 2
Fig 2. Logistic mixed-effect model including all predictors—Cosine distance.
Graphical representation of the effects arising from the logistic mixed-effect mode fitted considering all significant predictors at once and accuracy achieves using cosine to retrieve nearest neighbors as the dependent variable. 95% confidence bands are shown for each effect, and predictors are ordered from left to right according to how much they improve the model fit.
Fig 3
Fig 3. Cosine-based classification—Error analysis.
Mosaic plot showing the error analysis of the categorization experiment performed using cosine as the distance metric to retrieve nearest neighbors. Blue cells indicate cases in which the observed frequencies significantly exceed the expected frequencies under the assumption of independence between Correct and Predicted PoS tags, while red mark the opposite scenario. ADJ: adjectives; ADV: adverbs; FUNCT: function words; N: nouns; V: verbs. For a detailed explanation of how to interpret the plot, refer to the main text.
Fig 4
Fig 4. Logistic mixed-effect models for single predictors—Numeric overlap distance.
Graphical representation of the effects highlighted by the logistic mixed-effect models which included a single predictor (inputted in the model together with Time, which is part of the baseline model) using categorization accuracy achieved with numeric overlap as distance metric as the dependent variable. Subplots are ordered according to the reduction in AIC caused by adding the predictor to the baseline model: the most important predictor is in the top-left panel, then top-right, bottom-left, and finally bottom-right. 95% confidence bands are shown.
Fig 5
Fig 5. Logistic mixed-effect model including all predictors—Numeric overlap distance.
Graphical representation of the effects arising from the logistic mixed-effect mode fitted considering all significant predictors at once and accuracy achieved using numeric overlap to retrieve nearest neighbors as the dependent variable. 95% confidence bands are shown for each effect, and predictors are ordered according to how much they improve the model fit (first top-left, then top-right, bottom-left, and bottom-right.
Fig 6
Fig 6. Relation between lexical diversity and frequency, with numeric overlap-based classification.
Relation between Frequency and Diversity across words used in the categorization experiment analyzed in Studies 1, 2, and 3. Accuracy obtained using numeric overlap as the distance metric to compute nearest neighbors is shown highlighting misses in red.
Fig 7
Fig 7. Numeric overlap-based classification—Error analysis.
Mosaic plot showing the error analysis of the categorization experiment performed using numeric overlap as the distance metric to retrieve nearest neighbors. Blue cells indicate cases in which the observed frequencies significantly exceed the expected frequencies under the assumption of independence between Correct and Predicted PoS tags, while red mark the opposite scenario. ADJ: adjectives; ADV: adverbs; FUNCT: function words; N: nouns; V: verbs. For a detailed explanation of how to interpret the plot, refer to the main text.

References

    1. Akhtar N, Tomasello M. Young children’s productivity with word order and verb morphology. Developmental psychology. 1997;33(6):952–65. 10.1037/0012-1649.33.6.952 - DOI - PubMed
    1. Meylan SC, Frank MC, Roy BC, Levy R. The emergence of an abstract grammatical category in children’s early speech. Psychological science. 2017;28(2):181–192. 10.1177/0956797616677753 - DOI - PubMed
    1. Tomasello M. Do young children have adult syntactic competence? Cognition. 2000;74(3):209–253. 10.1016/S0010-0277(99)00069-4 - DOI - PubMed
    1. Gillis S, Ravid G. Language acquisition In: Sandra D, Östman JO, Verschueren J, editors. Cognition and pragmatics. Amsterdam: Benjamin; 2009. p. 201–249.
    1. Maratsos MP, Chalkley MA. The internal language of children syntax: The nature and ontogenesis of syntactic categories In: Nelson KE, editor. Children’s language. vol. 2 New York, NY: Gardner Press; 1980. p. 127–213.

Publication types

LinkOut - more resources