Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2009 Apr;12(3):388-95.
doi: 10.1111/j.1467-7687.2009.00824.x.

The secret is in the sound: from unsegmented speech to lexical categories

Affiliations

The secret is in the sound: from unsegmented speech to lexical categories

Morten H Christiansen et al. Dev Sci. 2009 Apr.

Abstract

When learning language, young children are faced with many seemingly formidable challenges, including discovering words embedded in a continuous stream of sounds and determining what role these words play in syntactic constructions. We suggest that knowledge of phoneme distributions may play a crucial part in helping children segment words and determine their lexical category, and we propose an integrated model of how children might go from unsegmented speech to lexical categories. We corroborated this theoretical model using a two-stage computational analysis of a large corpus of English child-directed speech. First, we used transition probabilities between phonemes to find words in unsegmented speech. Second, we used distributional information about word edges--the beginning and ending phonemes of words--to predict whether the segmented words from the first stage were nouns, verbs, or something else. The results indicate that discovering lexical units and their associated syntactic category in child-directed speech is possible by attending to the statistics of single phoneme transitions and word-initial and final phonemes. Thus, we suggest that a core computational principle in language acquisition is that the same source of information is used to learn about different aspects of linguistic structure.

PubMed Disclaimer

Figures

Figure 1
Figure 1
The distribution of phoneme transition pairs given the probability of encountering a word boundary between the two phonemes in the corpus of child-directed speech. A probability of 1 indicates that the two phonemes never occur together as a pair inside a word but always straddle a word boundary, whereas a probability of 0 implies that the phoneme pair always occurs inside a word and never are separated by a word boundary.
Figure 2
Figure 2
The completeness (left) and accuracy (right) of classification into lexical categories of the top-5000 lexical candidates from the segmentation procedure using the first and last phoneme in each lexical candidate (white bars) compared with baseline classifications (grey bars - error bars indicate standard error of the mean).

Comment in

References

    1. Aslin RN, Saffran JR, Newport EL. Computation of conditional probability statistics by 8-month-old infants. Psychological Science. 1998;9:321–324.
    1. Baayen RH, Pipenbrock R, Gulikers L. The CELEX Lexical Database (CD-ROM) Linguistic Data Consortium, University of Pennsylvania; Philadelphia, PA: 1995.
    1. Brent MR. Speech segmentation and word discovery: A computational perspective. Trends in Cognitive Science. 1999;3:294–301. - PubMed
    1. Brent MR, Cartwright TA. Distributional regularity and phonotactic constraints are useful for segmentation. Cognition. 1996;61:93–125. - PubMed
    1. Cairns P, Shillcock RC, Chater N, Levy J. Bootstrapping word boundaries: A bottom-up approach to speech segmentation. Cognitive Psychology. 1997;33:111–153. - PubMed

Publication types

LinkOut - more resources