Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 1992 Oct-Dec;35 ( Pt 4)(Pt 4):351-89.
doi: 10.1177/002383099203500401.

Comprehension of synthetic speech produced by rule: a review and theoretical interpretation

Affiliations
Review

Comprehension of synthetic speech produced by rule: a review and theoretical interpretation

S A Duffy et al. Lang Speech. 1992 Oct-Dec.

Abstract

In this paper, we review research on the perception and comprehension of synthetic speech produced by rule. We discuss the difficulties that synthetic speech causes for the listener and the evidence that the immediate result of those difficulties is a delay in the point at which words are recognized. We then argue that this delay in processing affects not only lexical access but also comprehension processes. We consider the mechanisms by which the comprehension system adjusts to this delay, the resulting costs to higher level comprehension processes, and the changes that occur in the language processing system as its familiarity with synthetic speech increases. Based on the framework we have developed, we suggest several directions for future research on the comprehension of synthetic speech.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Error rates (in percent) for various synthesis systems tested in both the closed- and open-response format MRT (from Logan, Greene, and Pisoni, 1989).
Fig. 2
Fig. 2
Response times (in msec) and percent correct for natural and synthetic words and non-words in a lexical decision task (from Pisoni, 1981).
Fig. 3
Fig. 3
Mean number of natural and synthetic words recalled as a function of memory preload (from Luce, Feustel, and Pisoni, 1983).
Fig. 4
Fig. 4
Number of subjects correctly recalling all of the digits as a function of memory preload (from Luce et al., 1983).
Fig. 5
Fig. 5
Probability of recall at each serial position for natural and synthetic word lists (from Luce et al., 1983).
Fig. 6
Fig. 6
Percent correct for different categories of information presented in natural and synthetic speech (from Luce, 1981).
Fig. 7
Fig. 7
Probability of a correct response for two kinds of information presented in natural and synthetic speech (from Ralston, Pisoni, Lively, Greene, and Mullennix, 1991).
Fig. 8
Fig. 8
Sentence verification response times (in msec) for True and False responses to three- and six-word sentences presented in seven voices (from Manous, Pisoni, Dedina, and Nusbaum, 1985). Means are based on only those trials on which the subject verified and transcribed the sentence correctly.
Fig. 9
Fig. 9
Mean sentence verification times (in msec) for True and False responses to three- and six-word sentences presented in natural and synthetic speech (from Pisoni, Manous, and Dedina, 1987). High-predictability sentences are displayed with open bars; low-predictability sentences are displayed with striped bars. Means are based on only those trials on which the subject verified and transcribed the sentence correctly.
Fig. 10
Fig. 10
Sentence-by-sentence listening times as a function of voice and text (from Ralston et al., 1991). Open bars represent natural speech; striped bars represent synthetic speech. Error bars represent one standard error of the sample mean.

Similar articles

Cited by

References

    1. Allen J, Hunnicutt MS, Klatt D. From Text to Speech: The MITalk System. Cambridge, UK: Cambridge University Press; 1987.
    1. Altmann GTM, editor. Cognitive Models of Speech Processing: Psycholinguistic and Computational Perspectives. Cambridge, MA: MIT Press; 1990.
    1. Auberge V. Developing a structured lexicon for synthesis of prosody. In: Bailly G, Benoit C, Sawallis TR, editors. Talking Machines: Theories, Models, and Designs. Amsterdam: North-Holland; 1992. pp. 307–321.
    1. Bard EG, Shillcock RC, Altmann GTM. The recognition of words after their acoustic offsets in spontaneous speech: Effects of subsequent context. Perception & Psychophysics. 1988;44:395–408. - PubMed
    1. Balota DA, Flores D’Arcais G, Rayner K, editors. Comprehension Processes in Reading. Hillsdale, NJ: Erlbaum; 1990.

Publication types

LinkOut - more resources