Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Jul 26;21(14):1210-4.
doi: 10.1016/j.cub.2011.06.007. Epub 2011 Jun 30.

A chimpanzee recognizes synthetic speech with significantly reduced acoustic cues to phonetic content

Affiliations

A chimpanzee recognizes synthetic speech with significantly reduced acoustic cues to phonetic content

Lisa A Heimbauer et al. Curr Biol. .

Abstract

A long-standing debate concerns whether humans are specialized for speech perception, which some researchers argue is demonstrated by the ability to understand synthetic speech with significantly reduced acoustic cues to phonetic content. We tested a chimpanzee (Pan troglodytes) that recognizes 128 spoken words, asking whether she could understand such speech. Three experiments presented 48 individual words, with the animal selecting a corresponding visuographic symbol from among four alternatives. Experiment 1 tested spectrally reduced, noise-vocoded (NV) synthesis, originally developed to simulate input received by human cochlear-implant users. Experiment 2 tested "impossibly unspeechlike" sine-wave (SW) synthesis, which reduces speech to just three moving tones. Although receiving only intermittent and noncontingent reward, the chimpanzee performed well above chance level, including when hearing synthetic versions for the first time. Recognition of SW words was least accurate but improved in experiment 3 when natural words in the same session were rewarded. The chimpanzee was more accurate with NV than SW versions, as were 32 human participants hearing these items. The chimpanzee's ability to spontaneously recognize acoustically reduced synthetic words suggests that experience rather than specialization is critical for speech-perception capabilities that some have suggested are uniquely human.

PubMed Disclaimer

Figures

Figure 1
Figure 1. Two Stimulus and Lexigram Examples
Waveforms and narrowband spectrograms of the words “apricot” and “sparkler” in natural, noise-vocoded (NV), and sine-wave (SW) forms, along with corresponding lexigrams. Waveforms show pressure variation over time, with Fourier-transform-based spectrograms revealing corresponding spectral features in the frequency domain (created using a sampling rate of 22.05 kHz and 0.03-sec Gaussian analysis window). Both synthetic forms are comprehensible to human listeners, but are acoustically significantly reduced relative to natural versions. NV speech retains primarily temporal cues and only rudimentary spectral information—acoustic features such as harmonic structure, formants (vocal-tract resonances), and formant transitions are removed or fundamentally altered [10]. SW speech has just three pure tones that track the lowest formants of natural speech [11], becoming so different that some characterize this form as bearing only an abstract resemblance to the original [3,22].
Figure 2
Figure 2. The Chimpanzee Subject
The chimpanzee subject “Panzee,” shown at about 33 months of age (left), and as an adult performing the experimental task (right). She routinely uses a lexigram board for two-way communication with human caregivers, as well as being able to select corresponding lexigrams when hearing 128 different, spoken words.
Figure 3
Figure 3. Performance by the Chimpanzee and Human Listeners
Means and standard errors of percentage-correct performance for 48 words heard in natural, NV, and SW forms. Experiments with the chimpanzee Panzee included testing each of 48 words 16 times in natural and 4 times in synthetic form. “First Trials” represent the 48 first instances of the chimpanzee hearing a word in a given synthetic form. The first set of SW results shows performance with non-contingent, intermittent reward delivery and no response feedback. The second set shows performance with contingent reward received on natural trials, but with no reward or response feedback on SW trials. The dashed line indicates the chance-performance rate of 25% correct. Humans heard and identified all 48 words once each in natural form, followed by either NV (16 listeners) or SW (16 listeners) versions. All comparisons to chance performance were statistically significant at P ≤ .008 and are marked by a pair of asterisks.

Comment in

References

    1. Liberman AM, Cooper FS, Shankweiler DP, Studdert-Kennedy M. Perception of the speech code. Psychol. Rev. 1967;74:431–461. - PubMed
    1. Liberman AM, Mattingly IG. The motor theory of speech perception revised. Cognition. 1985;21:1–36. - PubMed
    1. Remez RE, Rubin PE, Berns SM, Pardo JS, Lang JM. On the perceptual organization of speech. Psychol. Rev. 1994;101:129–156. - PubMed
    1. Trout JD. The biological basis of speech: What to infer from talking to the animals. Psychol. Rev. 2001;108:523–549. - PubMed
    1. Diehl RL, Lotto AJ, Holt LL. Speech perception. Annu. Rev. Psychol. 2004;55:149–179. - PubMed

Publication types

LinkOut - more resources