Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 May 14:12:e17320.
doi: 10.7717/peerj.17320. eCollection 2024.

Vocal complexity in the long calls of Bornean orangutans

Affiliations

Vocal complexity in the long calls of Bornean orangutans

Wendy M Erb et al. PeerJ. .

Abstract

Vocal complexity is central to many evolutionary hypotheses about animal communication. Yet, quantifying and comparing complexity remains a challenge, particularly when vocal types are highly graded. Male Bornean orangutans (Pongo pygmaeus wurmbii) produce complex and variable "long call" vocalizations comprising multiple sound types that vary within and among individuals. Previous studies described six distinct call (or pulse) types within these complex vocalizations, but none quantified their discreteness or the ability of human observers to reliably classify them. We studied the long calls of 13 individuals to: (1) evaluate and quantify the reliability of audio-visual classification by three well-trained observers, (2) distinguish among call types using supervised classification and unsupervised clustering, and (3) compare the performance of different feature sets. Using 46 acoustic features, we used machine learning (i.e., support vector machines, affinity propagation, and fuzzy c-means) to identify call types and assess their discreteness. We additionally used Uniform Manifold Approximation and Projection (UMAP) to visualize the separation of pulses using both extracted features and spectrogram representations. Supervised approaches showed low inter-observer reliability and poor classification accuracy, indicating that pulse types were not discrete. We propose an updated pulse classification approach that is highly reproducible across observers and exhibits strong classification accuracy using support vector machines. Although the low number of call types suggests long calls are fairly simple, the continuous gradation of sounds seems to greatly boost the complexity of this system. This work responds to calls for more quantitative research to define call types and quantify gradedness in animal vocal systems and highlights the need for a more comprehensive framework for studying vocal complexity vis-à-vis graded repertoires.

Keywords: Acoustic communication; Affinity propagation; Fuzzy clustering; Graded signals; Machine learning; Supervised classification; Support vector machines; Uniform manifold approximation and projection (UMAP); Unsupervised clustering; Vocal repertoire.

PubMed Disclaimer

Conflict of interest statement

The authors declare there are no competing interests.

Figures

Figure 1
Figure 1. Spectrogram depicting long call pulse types.
Pulses include HU = huitus, VO = volcano, HR = (high) roar, LR = low roar, IN = intermediary, SI = sigh. Spectrograms produced in Raven Pro 1.6.
Figure 2
Figure 2. Audio-visual classification agreement across observers.
Stacked barplots indicating (top) classification agreement by pulse type between observer 1–2 and observer 1–3 and (bottom) the number of observers who agreed on the pulse types assigned by observer 1; the average agreement index is indicated below each pulse type and demonstrates high agreement for HU and SI (≥2.77), but low agreement for VO and LR (2.08).
Figure 3
Figure 3. Barplot of classification accuracy for Spillmann et al. (2010) pulse scheme.
Comparison of classification accuracy of audio-visual classification (AV), calculated as the average agreement between three observer pairs compared to supervised machine learning classification (SVM).
Figure 4
Figure 4. Stacked barplots of affinity propagation clusters.
The barplots show the number of calls in each cluster classified by pulse type.
Figure 5
Figure 5. Typicality coefficients for each pulse type.
(A) Histogram showing the distribution of coefficients and (B) boxplot showing typicality values for each pulse type. Typicality thresholds were calculated following (Wadewitz et al., 2015). Typical calls were those whose typicality coefficients exceeded 0.976 and atypical calls were those below 0.855.
Figure 6
Figure 6. Stacked barplots of typical calls.
(A) the proportion of each pulse type that was typical for each cluster and (B) the number of typical calls in each cluster classified by pulse type.
Figure 7
Figure 7. UMAP projection of 46-feature dataset and power density spectrograms.
Colors indicate four clusters identified using unsupervised affinity propagation (upper left), two clusters and typical calls identified by fuzzy clustering (upper right), six pulse types labeled by human observer using the extracted feature set (lower left), and raw power density spectrograms (lower right).
Figure 8
Figure 8. Barplot of classification accuracy for revised pulse scheme.
Comparison of classification accuracy of audio-visual classification (AV), calculated as the average agreement between three observer pairs compared to supervised machine learning classification (SVM).

Similar articles

References

    1. Alloghani M, Al-Jumeily D, Mustafina J, Hussain A, Aljaaf AJ. Unsupervised and semi-supervised learning: supervised and unsupervised learning for data science. Springer; Cham: 2020. A systematic review on supervised and unsupervised machine learning algorithms for data science; pp. 3–21.
    1. Altmann J. Observational study of behavior: sampling methods. Behaviour. 1974;49:227–267. doi: 10.1163/156853974X00534. - DOI - PubMed
    1. Araya-Salas M, Smith-Vidaurre G. WarbleR: an r package to streamline analysis of animal acoustic signals. Methods in Ecology and Evolution. 2017;8(2):184–191. doi: 10.1111/2041-210X.12624. - DOI
    1. Arcadi AC. Phrase structure of wild chimpanzee pant hoots: patterns of production and interpopulation variability. American Journal of Primatology. 1996;39(3):159–178. doi: 10.1002/(SICI)1098-2345(1996)39:3<159::AID-AJP2>3.0.CO;2-Y. - DOI - PubMed
    1. Askew JA, Morrogh-Bernard HC. Acoustic characteristics of long calls produced by male orangutans (Pongo pygmaeus wurmbii): advertising individual identity, context, and travel direction. Folia Primatologica. 2016;87(5):305–319. doi: 10.1159/000452304. - DOI - PubMed

LinkOut - more resources