The secret is in the sound: from unsegmented speech to lexical categories

Morten H Christiansen¹, Luca Onnis, Stephen A Hockema

Affiliations

PMID: 19371361
PMCID: PMC2743257
DOI: 10.1111/j.1467-7687.2009.00824.x

The secret is in the sound: from unsegmented speech to lexical categories

Morten H Christiansen et al. Dev Sci. 2009 Apr.

. 2009 Apr;12(3):388-95.

doi: 10.1111/j.1467-7687.2009.00824.x.

Authors

Morten H Christiansen¹, Luca Onnis, Stephen A Hockema

Affiliation

¹ Department of Psychology, Cornell University, Ithaca, NY 14853, USA. christiansen@cornell.edu

PMID: 19371361
PMCID: PMC2743257
DOI: 10.1111/j.1467-7687.2009.00824.x

Abstract

When learning language, young children are faced with many seemingly formidable challenges, including discovering words embedded in a continuous stream of sounds and determining what role these words play in syntactic constructions. We suggest that knowledge of phoneme distributions may play a crucial part in helping children segment words and determine their lexical category, and we propose an integrated model of how children might go from unsegmented speech to lexical categories. We corroborated this theoretical model using a two-stage computational analysis of a large corpus of English child-directed speech. First, we used transition probabilities between phonemes to find words in unsegmented speech. Second, we used distributional information about word edges--the beginning and ending phonemes of words--to predict whether the segmented words from the first stage were nouns, verbs, or something else. The results indicate that discovering lexical units and their associated syntactic category in child-directed speech is possible by attending to the statistics of single phoneme transitions and word-initial and final phonemes. Thus, we suggest that a core computational principle in language acquisition is that the same source of information is used to learn about different aspects of linguistic structure.

PubMed Disclaimer

Figures

**Figure 1**
The distribution of phoneme transition pairs given the probability of encountering a word boundary between the two phonemes in the corpus of child-directed speech. A probability of 1 indicates that the two phonemes never occur together as a pair inside a word but always straddle a word boundary, whereas a probability of 0 implies that the phoneme pair always occurs inside a word and never are separated by a word boundary.

**Figure 2**
The completeness (left) and accuracy (right) of classification into lexical categories of the top-5000 lexical candidates from the segmentation procedure using the first and last phoneme in each lexical candidate (white bars) compared with baseline classifications (grey bars - error bars indicate standard error of the mean).

See this image and copyright information in PMC

Comment in

A core principle of studying language acquisition: it's a developmental system.
Samuelson LK. Samuelson LK. Dev Sci. 2009 Apr;12(3):407-9. doi: 10.1111/j.1467-7687.2009.00826.x. Dev Sci. 2009. PMID: 19371363 Free PMC article. No abstract available.
The learner as statistician: three principles of computational success in language acquisition.
Soderstrom M, Conwell E, Feldman N, Morgan J. Soderstrom M, et al. Dev Sci. 2009 Apr;12(3):409-11. doi: 10.1111/j.1467-7687.2009.00827.x. Dev Sci. 2009. PMID: 19371364 Free PMC article. No abstract available.

References

1. Aslin RN, Saffran JR, Newport EL. Computation of conditional probability statistics by 8-month-old infants. Psychological Science. 1998;9:321–324.
1. Baayen RH, Pipenbrock R, Gulikers L. The CELEX Lexical Database (CD-ROM) Linguistic Data Consortium, University of Pennsylvania; Philadelphia, PA: 1995.
1. Brent MR. Speech segmentation and word discovery: A computational perspective. Trends in Cognitive Science. 1999;3:294–301. - PubMed
1. Brent MR, Cartwright TA. Distributional regularity and phonotactic constraints are useful for segmentation. Cognition. 1996;61:93–125. - PubMed
1. Cairns P, Shillcock RC, Chater N, Levy J. Bootstrapping word boundaries: A bottom-up approach to speech segmentation. Cognitive Psychology. 1997;33:111–153. - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

The secret is in the sound: from unsegmented speech to lexical categories

Affiliation

The secret is in the sound: from unsegmented speech to lexical categories

Authors

Affiliation

Abstract

Figures

Comment in

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources