Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2024 Sep 4:2023.03.11.532197.
doi: 10.1101/2023.03.11.532197.

Unsupervised discovery of family specific vocal usage in the Mongolian gerbil

Affiliations

Unsupervised discovery of family specific vocal usage in the Mongolian gerbil

Ralph E Peterson et al. bioRxiv. .

Update in

Abstract

In nature, animal vocalizations can provide crucial information about identity, including kinship and hierarchy. However, lab-based vocal behavior is typically studied during brief interactions between animals with no prior social relationship, and under environmental conditions with limited ethological relevance. Here, we address this gap by establishing long-term acoustic recordings from Mongolian gerbil families, a core social group that uses an array of sonic and ultrasonic vocalizations. Three separate gerbil families were transferred to an enlarged environment and continuous 20-day audio recordings were obtained. Using a variational autoencoder (VAE) to quantify 583,237 vocalizations, we show that gerbils exhibit a more elaborate vocal repertoire than has been previously reported and that vocal repertoire usage differs significantly by family. By performing gaussian mixture model clustering on the VAE latent space, we show that families preferentially use characteristic sets of vocal clusters and that these usage preferences remain stable over weeks. Furthermore, gerbils displayed family-specific transitions between vocal clusters. Since gerbils live naturally as extended families in complex underground burrows that are adjacent to other families, these results suggest the presence of a vocal dialect which could be exploited by animals to represent kinship. These findings position the Mongolian gerbil as a compelling animal model to study the neural basis of vocal communication and demonstrates the potential for using unsupervised machine learning with uninterrupted acoustic recordings to gain insights into naturalistic animal behavior.

Keywords: bioacoustics; ethology; family behavior; longitudinal monitoring; social behavior; unsupervised machine learning; vocal communication.

PubMed Disclaimer

Conflict of interest statement

Competing Interest Statement The authors declare no competing financial interests.

Figures

Figure 1.
Figure 1.. Longitudinal familial audio recording.
(A) Recording apparatus. Four ultrasonic microphones sampled at 125 kHz continuously recorded a family in an enlarged environment. (B) Experiment timeline. Three gerbil families with the same family composition (2 adults, 4 pups) were recorded continuously for 20 days. (C) Extraction of sound events from raw audio using sound amplitude thresholding (Gray threshold = “th_2”, black threshold = “th_1” and “th_3”; see Methods). Vocalizations (n=583,237) are separated from non-vocal sounds (n=9,684,735) using a threshold on spectral flatness (Figure S1, see methods). (D) Summary of total sound event emission and average emission per hour. (E) Proportion of all sound events that are vocal or non-vocal sounds. (F) Summary of total vocalization emission and average emission per hour.
Figure 2.
Figure 2.. Unsupervised discovery of the Mongolian gerbil vocal repertoire.
Variational autoencoder and clustering. (A) Vocalization spectrograms (top) are input to a variational autoencoder (VAE) which encodes the spectrogram as a 32-D set of latent features (middle). The VAE learns latent features by minimizing the difference between original spectrograms and spectrograms reconstructed from the latent features by the VAE decoder (bottom). A gaussian mixture model (GMM) was trained on the latent features to cluster vocalizations into discrete categories. (B) Representative vocalizations from 12 distinct GMM clusters featuring monosyllabic vocalizations are shown surrounding a UMAP embedding of the latent features. Asterisk denotes vocal type not previously characterized. (C) Examples of multisyllabic vocalizations. White vertical lines indicate boundaries of monosyllabic elements. Asterisks denote multisyllabic vocal types not previously characterized.
Figure 3.
Figure 3.. Family specific vocal usage.
(A) UMAP probability density plots (axes same as Figure 2B) show significant differences between family repertoires (p < 0.01, MMD permutation test on latent space; see Methods). (B) Vocal type usage by family. Clusters sorted by cumulative usage across all families. Families show distinct usage patterns of different vocal clusters. (C) Clusters are resorted by the usage difference between families. (D) Spectrogram examples from top differentially used clusters (left) and location of clusters in embedding space (right).
Figure 4.
Figure 4.. Vocal usage differences remain stable across days of development.
(A) UMAP probability density plots for each day of the recording, across families. Purple box indicated recording days that are shared across families. These days are used for subsequent analyses in C-E. (B) GMM vocal cluster usage per day. Usages are normalized on a per-day basis. A unique color is used for each cluster type. (C) PCA projection of daily usages within the purple (shared recording days) period showing that families use a unique subset of clusters stably across days. (D) Mean max discrepancy (MMD) distance between VAE latent distributions of vocalizations between days and across families. (E) Multidimensional scaling projection of MMD matrix from (D). Family vocal repertoires are distinct and remain so across days.
Figure 5.
Figure 5.. Transition structure, but not emission structure, shows family specific differences.
(A) Vocalizations are emitted in a diurnal cycle. (B) Vocalizations consistently occur in seconds-long bouts across families. (C) Vocalization intervals (onset-to-onset) are consistent across families. (D) Vocalization durations are consistent across families. (E) Raw data examples of bouts. (F) Bouts typically occupy a similar area of vocal space. (G) Vocal cluster transition matrix. Vocalizations strongly favor self-transition. (H) Bigram probability graph. Self and other vocalization transition tendencies show family specific transitions (edges > 0.001 usage shown).

References

    1. Ågren G., 1984. Incest avoidance and bonding between siblings in gerbils. Behavioral Ecology and Sociobiology, 14, pp.161–169.
    1. Ågren G., 1984. Pair formation in the Mongolian gerbil. Animal behaviour, 32(2), pp.528–535.
    1. Ågren G., Zhou Q. and Zhong W., 1989. Ecology and social behaviour of Mongolian gerbils, Meriones unguiculatus, at Xilinhot, Inner Mongolia, China. Animal Behaviour, 37, pp.11–27.
    1. Ågren G., Zhou Q. and Zhong W., 1989. Territoriality, cooperation and resource priority: hoarding in the Mongolian gerbil, Meriones unguiculatus. Animal Behaviour, 37, pp.28–32.
    1. Amaro D., Ferreiro D.N., Grothe B. and Pecka M., 2021. Source identity shapes spatial preference in primary auditory cortex during active navigation. Current Biology, 31(17), pp.3875–3883. - PubMed

Publication types

LinkOut - more resources