. 2024 May 14:12:e17320.

doi: 10.7717/peerj.17320. eCollection 2024.

Vocal complexity in the long calls of Bornean orangutans

Wendy M Erb^{1

2}, Whitney Ross¹, Haley Kazanecki¹, Tatang Mitra Setia^{3

4}, Shyam Madhusudhana^{1

5}, Dena J Clink¹

Affiliations

¹ K. Lisa Yang Center for Conservation Bioacoustics, Cornell Lab of Ornithology, Cornell University, Ithaca, NY, United States of America.
² Department of Anthropology, Rutgers, The State University of New Jersey, New Brunswick, United States of America.
³ Primate Research Center, Universitas Nasional Jakarta, Jakarta, Indonesia.
⁴ Department of Biology, Faculty of Biology and Agriculture, Universitas Nasional Jakarta, Jakarta, Indonesia.
⁵ Centre for Marine Science and Technology, Curtin University, Perth, Australia.

PMID: 38766489
PMCID: PMC11100477
DOI: 10.7717/peerj.17320

Vocal complexity in the long calls of Bornean orangutans

Wendy M Erb et al. PeerJ. 2024.

. 2024 May 14:12:e17320.

doi: 10.7717/peerj.17320. eCollection 2024.

Authors

Wendy M Erb^{1

2}, Whitney Ross¹, Haley Kazanecki¹, Tatang Mitra Setia^{3

4}, Shyam Madhusudhana^{1

5}, Dena J Clink¹

Affiliations

¹ K. Lisa Yang Center for Conservation Bioacoustics, Cornell Lab of Ornithology, Cornell University, Ithaca, NY, United States of America.
² Department of Anthropology, Rutgers, The State University of New Jersey, New Brunswick, United States of America.
³ Primate Research Center, Universitas Nasional Jakarta, Jakarta, Indonesia.
⁴ Department of Biology, Faculty of Biology and Agriculture, Universitas Nasional Jakarta, Jakarta, Indonesia.
⁵ Centre for Marine Science and Technology, Curtin University, Perth, Australia.

PMID: 38766489
PMCID: PMC11100477
DOI: 10.7717/peerj.17320

Abstract

Vocal complexity is central to many evolutionary hypotheses about animal communication. Yet, quantifying and comparing complexity remains a challenge, particularly when vocal types are highly graded. Male Bornean orangutans (Pongo pygmaeus wurmbii) produce complex and variable "long call" vocalizations comprising multiple sound types that vary within and among individuals. Previous studies described six distinct call (or pulse) types within these complex vocalizations, but none quantified their discreteness or the ability of human observers to reliably classify them. We studied the long calls of 13 individuals to: (1) evaluate and quantify the reliability of audio-visual classification by three well-trained observers, (2) distinguish among call types using supervised classification and unsupervised clustering, and (3) compare the performance of different feature sets. Using 46 acoustic features, we used machine learning (i.e., support vector machines, affinity propagation, and fuzzy c-means) to identify call types and assess their discreteness. We additionally used Uniform Manifold Approximation and Projection (UMAP) to visualize the separation of pulses using both extracted features and spectrogram representations. Supervised approaches showed low inter-observer reliability and poor classification accuracy, indicating that pulse types were not discrete. We propose an updated pulse classification approach that is highly reproducible across observers and exhibits strong classification accuracy using support vector machines. Although the low number of call types suggests long calls are fairly simple, the continuous gradation of sounds seems to greatly boost the complexity of this system. This work responds to calls for more quantitative research to define call types and quantify gradedness in animal vocal systems and highlights the need for a more comprehensive framework for studying vocal complexity vis-à-vis graded repertoires.

Keywords: Acoustic communication; Affinity propagation; Fuzzy clustering; Graded signals; Machine learning; Supervised classification; Support vector machines; Uniform manifold approximation and projection (UMAP); Unsupervised clustering; Vocal repertoire.

PubMed Disclaimer

Conflict of interest statement

The authors declare there are no competing interests.

Figures

**Figure 1. Spectrogram depicting long call pulse types.**
Pulses include HU = huitus, VO = volcano, HR = (high) roar, LR = low roar, IN = intermediary, SI = sigh. Spectrograms produced in Raven Pro 1.6.

**Figure 2. Audio-visual classification agreement across observers.**
Stacked barplots indicating (top) classification agreement by pulse type between observer 1–2 and observer 1–3 and (bottom) the number of observers who agreed on the pulse types assigned by observer 1; the average agreement index is indicated below each pulse type and demonstrates high agreement for HU and SI (≥2.77), but low agreement for VO and LR (2.08).

**Figure 3. Barplot of classification accuracy for Spillmann et al. (2010) pulse scheme.**
Comparison of classification accuracy of audio-visual classification (AV), calculated as the average agreement between three observer pairs compared to supervised machine learning classification (SVM).

**Figure 4. Stacked barplots of affinity propagation clusters.**
The barplots show the number of calls in each cluster classified by pulse type.

**Figure 5. Typicality coefficients for each pulse type.**
(A) Histogram showing the distribution of coefficients and (B) boxplot showing typicality values for each pulse type. Typicality thresholds were calculated following (Wadewitz et al., 2015). Typical calls were those whose typicality coefficients exceeded 0.976 and atypical calls were those below 0.855.

**Figure 6. Stacked barplots of typical calls.**
(A) the proportion of each pulse type that was typical for each cluster and (B) the number of typical calls in each cluster classified by pulse type.

**Figure 7. UMAP projection of 46-feature dataset and power density spectrograms.**
Colors indicate four clusters identified using unsupervised affinity propagation (upper left), two clusters and typical calls identified by fuzzy clustering (upper right), six pulse types labeled by human observer using the extracted feature set (lower left), and raw power density spectrograms (lower right).

**Figure 8. Barplot of classification accuracy for revised pulse scheme.**
Comparison of classification accuracy of audio-visual classification (AV), calculated as the average agreement between three observer pairs compared to supervised machine learning classification (SVM).

See this image and copyright information in PMC

References

1. Alloghani M, Al-Jumeily D, Mustafina J, Hussain A, Aljaaf AJ. Unsupervised and semi-supervised learning: supervised and unsupervised learning for data science. Springer; Cham: 2020. A systematic review on supervised and unsupervised machine learning algorithms for data science; pp. 3–21.
1. Altmann J. Observational study of behavior: sampling methods. Behaviour. 1974;49:227–267. doi: 10.1163/156853974X00534. - DOI - PubMed
1. Araya-Salas M, Smith-Vidaurre G. WarbleR: an r package to streamline analysis of animal acoustic signals. Methods in Ecology and Evolution. 2017;8(2):184–191. doi: 10.1111/2041-210X.12624. - DOI
1. Arcadi AC. Phrase structure of wild chimpanzee pant hoots: patterns of production and interpopulation variability. American Journal of Primatology. 1996;39(3):159–178. doi: 10.1002/(SICI)1098-2345(1996)39:3<159::AID-AJP2>3.0.CO;2-Y. - DOI - PubMed
1. Askew JA, Morrogh-Bernard HC. Acoustic characteristics of long calls produced by male orangutans (Pongo pygmaeus wurmbii): advertising individual identity, context, and travel direction. Folia Primatologica. 2016;87(5):305–319. doi: 10.1159/000452304. - DOI - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Vocal complexity in the long calls of Bornean orangutans

Affiliations

Vocal complexity in the long calls of Bornean orangutans

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

MeSH terms

LinkOut - more resources

Full Text Sources

Miscellaneous