The secret is in the sound: from unsegmented speech to lexical categories
- PMID: 19371361
- PMCID: PMC2743257
- DOI: 10.1111/j.1467-7687.2009.00824.x
The secret is in the sound: from unsegmented speech to lexical categories
Abstract
When learning language, young children are faced with many seemingly formidable challenges, including discovering words embedded in a continuous stream of sounds and determining what role these words play in syntactic constructions. We suggest that knowledge of phoneme distributions may play a crucial part in helping children segment words and determine their lexical category, and we propose an integrated model of how children might go from unsegmented speech to lexical categories. We corroborated this theoretical model using a two-stage computational analysis of a large corpus of English child-directed speech. First, we used transition probabilities between phonemes to find words in unsegmented speech. Second, we used distributional information about word edges--the beginning and ending phonemes of words--to predict whether the segmented words from the first stage were nouns, verbs, or something else. The results indicate that discovering lexical units and their associated syntactic category in child-directed speech is possible by attending to the statistics of single phoneme transitions and word-initial and final phonemes. Thus, we suggest that a core computational principle in language acquisition is that the same source of information is used to learn about different aspects of linguistic structure.
Figures


Comment in
-
A core principle of studying language acquisition: it's a developmental system.Dev Sci. 2009 Apr;12(3):407-9. doi: 10.1111/j.1467-7687.2009.00826.x. Dev Sci. 2009. PMID: 19371363 Free PMC article. No abstract available.
-
The learner as statistician: three principles of computational success in language acquisition.Dev Sci. 2009 Apr;12(3):409-11. doi: 10.1111/j.1467-7687.2009.00827.x. Dev Sci. 2009. PMID: 19371364 Free PMC article. No abstract available.
References
-
- Aslin RN, Saffran JR, Newport EL. Computation of conditional probability statistics by 8-month-old infants. Psychological Science. 1998;9:321–324.
-
- Baayen RH, Pipenbrock R, Gulikers L. The CELEX Lexical Database (CD-ROM) Linguistic Data Consortium, University of Pennsylvania; Philadelphia, PA: 1995.
-
- Brent MR. Speech segmentation and word discovery: A computational perspective. Trends in Cognitive Science. 1999;3:294–301. - PubMed
-
- Brent MR, Cartwright TA. Distributional regularity and phonotactic constraints are useful for segmentation. Cognition. 1996;61:93–125. - PubMed
-
- Cairns P, Shillcock RC, Chater N, Levy J. Bootstrapping word boundaries: A bottom-up approach to speech segmentation. Cognitive Psychology. 1997;33:111–153. - PubMed
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources