Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2009 Aug 2;61(2):238-257.
doi: 10.1016/j.jml.2009.05.001.

Simulating Language-specific and Language-general Effects in a Statistical Learning Model of Chinese Reading

Affiliations

Simulating Language-specific and Language-general Effects in a Statistical Learning Model of Chinese Reading

Jianfeng Yang et al. J Mem Lang. .

Abstract

Many theoretical models of reading assume that different writing systems require different processing assumptions. For example, it is often claimed that print-to-sound mappings in Chinese are not represented or processed sub-lexically. We present a connectionist model that learns the print to sound mappings of Chinese characters using the same functional architecture and learning rules that have been applied to English. The model predicts an interaction between item frequency and print-to-sound consistency analogous to what has been found for English, as well as a language-specific regularity effect particular to Chinese. Behavioral naming experiments using the same test items as the model confirmed these predictions. Corpus properties and the analyses of internal representations that evolved over training revealed that the model was able to capitalize on information in "phonetic components" - sub-lexical structures of variable size that convey probabilistic information about pronunciation. The results suggest that adult reading performance across very different writing systems may be explained as the result of applying the same learning mechanisms to the particular input statistics of writing systems shaped by both culture and the exigencies of communicating spoken language in a visual medium.

PubMed Disclaimer

Figures

Figure 1
Figure 1
A. Two orthographic “neighborhoods” in Chinese. Examples on the left are from a perfectly consistent neighborhood: Every word containing the phonetic component is pronounced the same way. On the right, an inconsistent neighborhood is shown. The top two items are “regular,” in that they share the pronunciation of the phonetic component when it appears as a simple character. The bottom three items have different pronunciations from the component as a simple character and are therefore irregular. B. Two radicals in phonetic component of Panel A (left) also can be phonetic components to form their families.
Figure 2
Figure 2
Architecture and representational scheme. A. The architecture of the model; arrows indicate connections between every unit in each layer. B. The representation of a Chinese syllable example on the phonological output layer. C. The representation of a Chinese character including 21 units for overall structure and 249 units to represent 7 radical slots.
Figure 3
Figure 3
Predictions from the model for all three item types: Regular-consistent (R-C), Regular-inconsistent (R-I) and Irregular-inconsistent (I-I) at two levels of frequency (high and low) are shown in the graph on the left. On the right are response latency data from Chinese adult readers naming the same items. The model correctly predicts that both consistency and regularity interact with frequency, and a regularity effect for inconsistent items.
Figure 4
Figure 4
Predictions from the model for interaction between consistency (consistent and inconsistent) and frequency (middle and low) are shown in the graph on the left. On the right are response latency data from Chinese adult readers naming on the same characters. A strong consistency effect was found for low frequency and the weak consistency effect was found in high frequency for both participants and models.
Figure 5
Figure 5
Panel A is a histogram showing how many phonetic components are composed of different numbers of radicals -- most are composed of more than two. Panels B, C, and D are histograms showing the number of characters (B) syllables (C) and syllables not counting tone (D) in which radicals and phonetic components occur. Phonetic components have a much more tightly constrained distribution than radicals in all cases.
Figure 6
Figure 6
Analyses of the internal representations that develop as the model learns to read. Panel A depicts change in the mean Euclidean distance for comparisons among item types over training. Between P. Family = comparisons between all items that share the critical phonetic component (formula image) and control phonograms; Within P. Family = comparisons among all items that share the critical phonetic component; Orth. Control = comparisons between all items with the critical phonetic component and control items selected to share the same amount of orthographic information; Phon. Control = comparisons between all items with the critical phonetic component and their homophones that do not overlap orthographically. Panel B depicts the similarity space based on orthographic inputs for the test items. Grey patches indicate clusters of items that share orthographic structure; black circles indicate items that share a phonetic component. In Panel C, the similarity space based on hidden unit activations before training is shown. Grey patches and black circles as in Panel B. Panel D shows the similarity space based on hidden unit activations after 3 million trials of training on spelling to sound translation. The gray patch indicates items that share the same phonetic component, and cluster together only after training on spelling-to-sound correspondences. Black circles encompass items that share the same pronunciation in addition to sharing a phonetic component.
Figure 7
Figure 7
Hidden unit activations for seven of the characters from the similarity analyses. Items from the same phonetic family as the sample item (formula image) have overlapping representations, whereas control items, matched for the degree of orthographic similarity, do not.

References

    1. Balota D, Ferraro F. A dissociation of frequency and regularity effects in pronunciation performance across young adults, older adults, and individuals with senile dementia of the Alzheimer type. Journal of Memory and Language. 1993;32(5):573–592.
    1. Bentin S, Frost R. Processing lexical ambiguity and visual word recognition in a deep orthography. Memory & Cognition. 1987;15(1):13–23. - PubMed
    1. Bi Y, Han Z, Weekes B, Shu H. The interaction between semantic and the nonsemantic systems in reading: Evidence from Chinese. Neuropsychologia. 2007;45(12):2660–2673. - PubMed
    1. Chen H-C, Flores d'Arcais GB, Cheung S-L. Orthographic and phonological activation in recognizing Chinese characters. Psychological Research. 1995;58(2):144–153.
    1. Chen H-C, Shu H. Lexical activation during the recognition of Chinese characters: Evidence against early phonological activation. Psychonomic Bulletin & Review. 2001;8(3):511–518. - PubMed

LinkOut - more resources