Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2022 May 13;25(6):104393.
doi: 10.1016/j.isci.2022.104393. eCollection 2022 Jun 17.

Toward understanding the communication in sperm whales

Affiliations
Review

Toward understanding the communication in sperm whales

Jacob Andreas et al. iScience. .

Abstract

Machine learning has been advancing dramatically over the past decade. Most strides are human-based applications due to the availability of large-scale datasets; however, opportunities are ripe to apply this technology to more deeply understand non-human communication. We detail a scientific roadmap for advancing the understanding of communication of whales that can be built further upon as a template to decipher other forms of animal and non-human communication. Sperm whales, with their highly developed neuroanatomical features, cognitive abilities, social structures, and discrete click-based encoding make for an excellent model for advanced tools that can be applied to other animals in the future. We outline the key elements required for the collection and processing of massive datasets, detecting basic communication units and language-like higher-level structures, and validating models through interactive playback experiments. The technological capabilities developed by such an undertaking hold potential for cross-applications in broader communities investigating non-human communication and behavioral research.

Keywords: Artificial intelligence; Ethology; Linguistics; Natural language processing.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

None
Graphical abstract
Figure 1
Figure 1
An approach to sperm whale communication that integrates biology, robotics, machine learning, and linguistics expertise, and comprise the following key steps Record: collect large-scale longitudinal multimodal dataset of whale communication and behavioral data from a variety of sensors. Process: reconcile and process the multi-sensor data. Decode: using machine learning techniques, create a model of whale communication, characterize its structure, and link it to behavior. Encode & Playback: conduct interactive playback experiments and refine the whale language model. Illustration © 2021 Alex Boersma.
Figure 2
Figure 2
Sperm whale bioacoustic system (A) Sperm whale head contains the spermaceti organ (c), a cavity filled with almost 2,000 L of wax-like liquid, and the junk compartment (f), comprising a series of wafer-like bodies believed to act as acoustic lenses. The spermaceti organ and junk act as two connected tubes, forming a bent, conical horn of about 10 m in length and 0.8 m aperture in large mature males. The sound emitted by the phonic lips (i) in the front of the head is focused by traveling through the bent horn, producing a flat wavefront at the exit surface. (B) Typical temporal structure of sperm whale echolocation and coda clicks. Echolocation signals are produced with consistent inter-click intervals (of approximately 0.4 s) while coda clicks are arranged in stereotypical sequences called “codas” lasting less than 2 s. Codas are characterized by the different number of constituent clicks and the intervals between them (called inter-click intervals or ICIs). Codas are typically produced in multi-party exchanges that can last from about 10 s to over half an hour. Each click, in turn, presents itself as a sequence of equally spaced pulses, with inter-pulse interval (IPI) of an order of 3–4 ms in an adult female, which is the result of the sound reflecting within the spermaceti organ. Illustration © 2021 Alex Boersma.
Figure 3
Figure 3
Comparative size of datasets used for training NLP models (represented by the circle area) GPT-3 is only partially visible, while the DSWP dataset is a tiny dot on this plot (located at the center of the dashed circle). Shown in red is the estimated size of a new dataset planned to be collected in Dominica by Project CETI, an interdisciplinary initiative for cetacean communication interpretation. The estimate is based on the assumption of nearly continuous monitoring of 50–400 whales. The estimate assumes 75%–80% of their vocalizations constituting echolocation clicks, and 20%–25% being coda clicks. A typical Caribbean whale coda has five clicks and lasts 4 s (including a silence between two subsequent codas), yielding a rate of 1.25 clicks/sec. Overall, we estimate it would be possible to collect between 400M and 4B clicks per year as a longitudinal and continuous recording of bioacoustic signals as well as detailed behavior and environmental data.
Figure 4
Figure 4
Schematic of whale bioacoustic data collection with multiple data sources by several classes of assets These include tethered buoy arrays (b), which track the whales in a large area in real time by continuously transmitting their data to shore (g), floaters (e), and robotic fishes (d)Tags (c) attached to whales can possibly provide the most detailed bioacoustic and behavioral data. Aerial drones (a) can be used to assist tag deployment (a1), recovery (a2), and provide visual observation of the whales (a3). The collected multimodal data (1) have to be processed to reconstruct a social network of sperm whales. The raw acoustic data (2) have to be analyzed by ML algorithms to detect (3) and classify (4) clicks. Source separation and identification (5) algorithms would allow reconstructing multi-party conversations by attributing different clicks to the whales producing them. Illustration © 2021 Alex Boersma.

References

    1. Abbott L.F., Bock D.D., Callaway E.M., Denk W., Dulac C., Fairhall A.L., Fiete I., Harris K.M., Helmstaedter M., Jain V., et al. The mind of a mouse. Cell. 2020;182:1372–1376. doi: 10.1016/j.cell.2020.08.010. - DOI - PubMed
    1. Ackers S.H., Slobodchikoff C.N. Communication of stimulus size and shape in alarm calls of gunnison’s Prairie dogs, Cynomys gunnisoni. Ethology. 1999;105:149–162. doi: 10.1046/j.1439-0310.1999.00381.x. - DOI
    1. Amano M., Kourogi A., Aoki K., Yoshioka M., Mori K. Differences in sperm whale codas between two waters off Japan: possible geographic separation of vocal clans. J. Mammal. 2014;95:169–175. doi: 10.1644/13-mamm-a-172. - DOI
    1. Amorim T.O.S., Rendell L., Di Tullio J., Secchi E.R., Castro F.R., Andriolo A. Coda repertoire and vocal clans of sperm whales in the western Atlantic Ocean. Deep Sea Res. Part I Oceanogr. Res. Pap. 2020;160:103254. doi: 10.1016/j.dsr.2020.103254. - DOI
    1. Andreas J., Dragan A., Klein D. 55th Annual Meeting of the Association for Computational Linguistics. 2017. Translating neuralese; pp. 232–242. - DOI