Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Apr 2:10:729.
doi: 10.3389/fpsyg.2019.00729. eCollection 2019.

Language Origins Viewed in Spontaneous and Interactive Vocal Rates of Human and Bonobo Infants

Affiliations

Language Origins Viewed in Spontaneous and Interactive Vocal Rates of Human and Bonobo Infants

D Kimbrough Oller et al. Front Psychol. .

Abstract

From the first months of life, human infants produce "protophones," speech-like, non-cry sounds, presumed absent, or only minimally present in other apes. But there have been no direct quantitative comparisons to support this presumption. In addition, by 2 months, human infants show sustained face-to-face interaction using protophones, a pattern thought also absent or very limited in other apes, but again, without quantitative comparison. Such comparison should provide evidence relevant to determining foundations of language, since substantially flexible vocalization, the inclination to explore vocalization, and the ability to interact socially by means of vocalization are foundations for language. Here we quantitatively compare data on vocalization rates in three captive bonobo (Pan paniscus) mother-infant pairs with various sources of data from our laboratories on human infant vocalization. Both humans and bonobos produced distress sounds (cries/screams) and laughter. The bonobo infants also produced sounds that were neither screams nor laughs and that showed acoustic similarities to the human protophones. These protophone-like sounds confirm that bonobo infants share with humans the capacity to produce vocalizations that appear foundational for language. Still, there were dramatic differences between the species in both quantity and function of the protophone and protophone-like sounds. The bonobo protophone-like sounds were far less frequent than the human protophones, and the human protophones were far less likely to be interpreted as complaints and more likely as vocal play. Moreover, we found extensive vocal interaction between human infants and mothers, but no vocal interaction in the bonobo mother-infant pairs-while bonobo mothers were physically responsive to their infants, we observed no case of a bonobo mother vocalization directed to her infant. Our cross-species comparison focuses on low- and moderate-arousal circumstances because we reason the roots of language entail vocalization not triggered by excitement, for example, during fighting or intense play. Language appears to be founded in flexible vocalization, used to regulate comfortable social interaction, to share variable affective states at various levels of arousal, and to explore vocalization itself.

Keywords: babbling; bonobo; comparative psychology; evolution of language; human evolution; infant directed speech; origin of language; parent–infant interaction.

PubMed Disclaimer

Figures

FIGURE 1
FIGURE 1
Human protophones. The spectrograms (range 4–5 kHz, 30 Hz bandwidth) and waveforms illustrate human protophones, which come in extremely variable form, as illustrated. Even these examples vastly underplay the acoustic variability of protophones. (A–D) are categorized as vocants, the most prototypical human infant vocal type, with consistent harmonic spacing and little or no dysphonation (indicating modal voice, the overwhelmingly typical human phonatory pattern in speech). The sounds were produced by 1–3 month-old typically developing human infants. Vocants are often as short as 0.1 s but can be as long as 3 s. Their intonation is not always smooth, but may involve notable variations as in D, where the rise and fall of the harmonics across time signals intonational variation. (E–H) are growls, also from 1–3 month-olds. In growls, phonation is harsh (i.e., it is chaotic, and harmonics are absent or less prominent than in vocants) as in E and F or creaky (consisting of a pulse regime, including prominent spikes in the waveform) as in G and H. As with vocants, growls can be very short or very long. (I–K) are squeals, from 0–3 month-olds. Squeals always show very high pitch (f0) as seen in widely spaced harmonics during at least a significant portion of the utterance. As with vocants and growls, squeals can be very short or very long, and as with vocants, they can involve considerable intonational variation, as seen in all three presented examples. (L,M) are reduplicated canonical babbles from 11 month-old infants. This is a vocal type that has never been documented to occur in any non-human primate even with human training. Canonical babbling involves rhythmic modulation of the acoustic waveform by movements of the jaw, lips, and/or tongue during modal phonation. From a phonatory standpoint, canonical babbles are vocants, but their supraglottal articulations result in a special pattern of well-formed syllables, adaptable for speech.
FIGURE 2
FIGURE 2
Human and bonobo sounds of the speech-like grouping. (A) offers a spectrogram and waveform to illustrate an additional human (3 months) vocant, selected as a particularly prototypical human infant protophone, with consistent harmonic spacing and a smooth intonational pattern involving little or no dysphonation. (B–F) are similarly composed displays showing bonobo infant sounds deemed auditorily similar to (that is, pertaining to the acoustic range encompassed by) human protophones, all including laryngeal phonation and clear harmonic energies. These bonobo sounds appear to be acoustically similar enough to the most common human protophones (vocants, squeals, and growls) that we treat them as candidates for speech-like material.
FIGURE 3
FIGURE 3
Cries/screams and laughs in human and bonobo infants. In the first row, a spectrographic and waveform display of an infant bonobo scream in three bursts at high pitch (f0), each burst about 600 ms, is contrasted with a prototypical but acoustically quite distinct pattern of human infant cry. Prototypical human infant cry often occurs as a continuous phonatory event, including at least one period of distinct dysphonation, as seen in the spectrogram beginning at about 500 ms. These bonobo and human negative vocalizations are thus similar in typically showing notable dysphonation, but very different in the timing of its occurrence. They are also different in that human cry, while it can occur at high f0 (roughly pitch), is typically produced at much lower f0 than the bonobo screams we observed. Based on the functional similarity of infant bonobo scream and human infant cry, we treat them as analogous in spite of acoustic differences. The second row displays a multi-segment infant bonobo laugh, followed by a single bonobo infant laugh segment, compared with a human infant laugh segment. The laughs differ from scream/cry in that their bursts and nuclei tend to be much shorter in both species. Laughs differ across the species in that bonobo laugh often consisted of an ingressive-egressive pattern rather than a sequence of egressive bursts (as in the figure), while human infant laughs are overwhelmingly egressive, consisting of a glottal burst (as in the figure) followed by a brief voiced nucleus. Again functional similarly of the sounds called laughter in the two species (both occurring as playful, joyful expressions) leads us to treat them as analogous in spite of their acoustic differences.
FIGURE 4
FIGURE 4
Three vocal types of human and bonobo infants in seconds vocalized per minute. The figure displays vocal seconds/minute in human and bonobo infants in the first year by vocal type, human data derived from audio–video recordings of the Memphis1 study. Individual bonobo and human infant laugh rates overlapped. Distributions also overlapped for cry/scream. Protophones in humans, the sounds regarded as precursors to speech, occurred far more frequently than any other vocal type from either species, and individual human protophone rates did not overlap with rates for any other vocal type for either species; 95% confidence intervals are displayed.
FIGURE 5
FIGURE 5
Protophone rates for human infants compared with rates of candidate speech-like sounds produced by bonobo infants. The most speech-like infant bonobo vocalizations in seconds/minute from audio–video recordings of early and late in the first year of life occurred far less than protophones from human infants similarly recorded in the Athens study. The Athens data are broken down for human caregivers (1) present but silent (NAS, no adult speech), (2) present and speaking to infants (IDS, infant-directed speech), or (3) present but speaking to another adult (ADS, adult-directed speech). In all three circumstances, the human protophone rates were dramatically higher than those of the bonobo infants; 95% confidence intervals are displayed.
FIGURE 6
FIGURE 6
Laboratory and all-day at home human infant vocalization rates along with infant-directed speech/vocalization (IDS/IDV) rates in humans and bonobos. The data provide comparisons from all the human studies (Memphis1, Athens, Memphis2) displayed in utterances/minute. As indicated on the right of the display, while human parents produced considerable IDS (>12 utterances/min in the laboratory and almost 2 utterances/min in randomly sampled segments from all-day recordings), bonobo mothers produced no IDV at all in the recordings. The human parents produced far more IDS in the laboratory than at home, and in the laboratory, they produced about twice as many IDS utterances/minute as human infants produced protophones, a pattern that appears to correspond to a parental “teaching” mode or perhaps a style adopted for the camera. At home, the patterns were very different, with parents producing far less IDS. In fact when the human infants were awake in randomly sampled segments from all-day recordings, the laboratory pattern was reversed, and the infants produced more than twice as many protophones as their mothers produced IDS utterances. Further the rate of infant vocalizations in randomly selected samples at home was about the same as the rate occurring during adult-directed speech (ADS) in the laboratory.

Similar articles

Cited by

References

    1. Ackermann H., Hage S. R., Ziegler W. (2014). Brain mechanisms of acoustic communication in humans and nonhuman primates: an evolutionary perspective. Behav. Brain Sci. 37 529–546. 10.1017/S0140525X13003099 - DOI - PubMed
    1. Anderson B. J., Vietze P., Dokecki P. R. (1977). Reciprocity in vocal interactions of mothers and infants. Child Dev. 48 1676–1681. 10.2307/1128534 - DOI - PubMed
    1. Beckwith L., Sigman M., Cohen S. E., Parmelee A. H. (1977). Vocal output in preterm infants. Dev. Psychobiol. 10 543–554. 10.1002/dev.420100608 - DOI - PubMed
    1. Belardi K. M., Watson L. R., Faldowski R., Baranek G. T., Crais B., Patten E., et al. (2017). A retrospective video analysis of canonical babbling and volubility in infants with fragile X Syndrome at 9 -12 Months of Age. J. Autism Dev. Disabil. 47 1193–1206. 10.1007/s10803-017-3033-4 - DOI - PMC - PubMed
    1. Bermejo M., Omedes A. (1999). Preliminary vocal repertoire and vocal communication of wild bonobos (Pan paniscus) at Lilungu (Democratic Republic of Congo). Folia Primatol. 70 328–357. 10.1159/000021717 - DOI - PubMed

LinkOut - more resources