. 2024 Dec 16:12:RP89892.

doi: 10.7554/eLife.89892.

Unsupervised discovery of family specific vocal usage in the Mongolian gerbil

Ralph E Peterson^{1

2}, Aman Choudhri³, Catalin Mitelut¹, Aramis Tanelus^{1

2}, Athena Capo-Battaglia¹, Alex H Williams^{1

2}, David M Schneider¹, Dan H Sanes^{1

4

5

6}

Affiliations

¹ Center for Neural Science, New York University, New York, United States.
² Center for Computational Neuroscience, Flatiron Institute, New York, United States.
³ Columbia University, New York, New York, United States.
⁴ Department of Psychology, New York University, New York, United States.
⁵ Neuroscience Institute, New York University School of Medicine, New York, United States.
⁶ Department of Biology, New York University, New York, United States.

PMID: 39680425
PMCID: PMC11649239
DOI: 10.7554/eLife.89892

Unsupervised discovery of family specific vocal usage in the Mongolian gerbil

Ralph E Peterson et al. Elife. 2024.

. 2024 Dec 16:12:RP89892.

doi: 10.7554/eLife.89892.

Authors

Ralph E Peterson^{1

2}, Aman Choudhri³, Catalin Mitelut¹, Aramis Tanelus^{1

2}, Athena Capo-Battaglia¹, Alex H Williams^{1

2}, David M Schneider¹, Dan H Sanes^{1

4

5

6}

Affiliations

¹ Center for Neural Science, New York University, New York, United States.
² Center for Computational Neuroscience, Flatiron Institute, New York, United States.
³ Columbia University, New York, New York, United States.
⁴ Department of Psychology, New York University, New York, United States.
⁵ Neuroscience Institute, New York University School of Medicine, New York, United States.
⁶ Department of Biology, New York University, New York, United States.

PMID: 39680425
PMCID: PMC11649239
DOI: 10.7554/eLife.89892

Abstract

In nature, animal vocalizations can provide crucial information about identity, including kinship and hierarchy. However, lab-based vocal behavior is typically studied during brief interactions between animals with no prior social relationship, and under environmental conditions with limited ethological relevance. Here, we address this gap by establishing long-term acoustic recordings from Mongolian gerbil families, a core social group that uses an array of sonic and ultrasonic vocalizations. Three separate gerbil families were transferred to an enlarged environment and continuous 20-day audio recordings were obtained. Using a variational autoencoder (VAE) to quantify 583,237 vocalizations, we show that gerbils exhibit a more elaborate vocal repertoire than has been previously reported and that vocal repertoire usage differs significantly by family. By performing gaussian mixture model clustering on the VAE latent space, we show that families preferentially use characteristic sets of vocal clusters and that these usage preferences remain stable over weeks. Furthermore, gerbils displayed family-specific transitions between vocal clusters. Since gerbils live naturally as extended families in complex underground burrows that are adjacent to other families, these results suggest the presence of a vocal dialect which could be exploited by animals to represent kinship. These findings position the Mongolian gerbil as a compelling animal model to study the neural basis of vocal communication and demonstrates the potential for using unsupervised machine learning with uninterrupted acoustic recordings to gain insights into naturalistic animal behavior.

Keywords: auditory neuroscience; bioacoustics; ecology; ethology; mongolian gerbil; mongolian gerbil (meriones unguiculatus); neuroscience; social behavior; vocal communication.

Plain language summary

Every time you speak, the sounds coming out of your mouth may carry more meaning that you may have intended; they may reveal, for example, which country, city or even neighborhood you may be coming from. Indeed, the vocal patterns that humans use to communicate differ from one population to the next, creating an array of languages, dialects and accents. Such diversity has also been identified in various social species across the animal kingdom. Naked mole rats, for instance, which live underground in complex societies, exhibit different ‘dialects’ depending on their group of origin. Yet studying the vocal patterns of animals has remained difficult, especially for species inhabiting burrows or other environments difficult to access. Aiming to bypass these limitations, Peterson et al. adopted a ‘naturalistic’ approach that allowed them to capture the vocal calls of three families of Mongolian gerbils living undisturbed in enclosures that mimic features of their natural environment. These animals spend their lives underground in tight-knit families, with multiple groups often being in close proximity. Researchers have speculated that individuals may rely on vocal cues to identify whether they are part of the same colony, as they are often too far from each other to rely on sight or smell. Over half a million vocalizations obtained continuously through the course of 20 days were analyzed using an artificial intelligence technique known as unsupervised machine learning. The analyses helped add new types of calls to the gerbil vocal repertoire, but also highlighted its complexity. In particular, they revealed that the animals could combine individual vocal elements into complex sequences. More importantly, this approach showed that gerbil families have vocal dialects that are stable across weeks, with each group displaying a preference for certain call types (i.e. words) and certain sequential patterns (i.e. phrases). These findings demonstrate the benefits of the approach developed by Peterson et al. for the study of animal vocalizations. Going forward, they also suggest that the Mongolian gerbil could be used as an animal model to study the neural basis of vocal communication.

PubMed Disclaimer

Conflict of interest statement

RP, AC, CM, AT, AC, AW, DS, DS No competing interests declared

Figures

**Figure 1.. Longitudinal familial audio recording.**
(A) Recording apparatus. Four ultrasonic microphones sampled at 125 kHz continuously recorded a family in an enlarged environment. (B) Experiment timeline. Three gerbil families with the same family composition (2 adults, 4 pups) were recorded continuously for 20 days. (C) Extraction of sound events from raw audio using sound amplitude thresholding (Gray threshold = ‘th_2’, black threshold = ‘th_1’ and ‘th_3’; see Methods). Vocalizations (n=583,237) are separated from non-vocal sounds (n=9,684,735) using a threshold on spectral flatness (Figure 1—figure supplement 1 see Methods). (D) Summary of total sound event emission and average emission per hour. (E) Proportion of all sound events that are vocal or non-vocal sounds. (F) Summary of total vocalization emission and average emission per hour.

**Figure 2.. Unsupervised discovery of the Mongolian gerbil vocal repertoire.**
Variational autoencoder and clustering. (A) Vocalization spectrograms (top) are input to a variational autoencoder (VAE) which encodes the spectrogram as a 32-D set of latent features (middle). The VAE learns latent features by minimizing the difference between original spectrograms and spectrograms reconstructed from the latent features by the VAE decoder (bottom). A gaussian mixture model (GMM) was trained on the latent features to cluster vocalizations into discrete categories. (B) Representative vocalizations from 12 distinct GMM clusters featuring monosyllabic vocalizations are shown surrounding a UMAP embedding of the latent features. Asterisk denotes vocal type not previously characterized. (C) Examples of multisyllabic vocalizations. White vertical lines indicate boundaries of monosyllabic elements. Asterisks denote multisyllabic vocal types not previously characterized.

**Figure 2—figure supplement 1.. VAE training and GMM clustering.**
(A) VAE reconstruction examples for different vocalization types. (B) VAE test and training loss show plateau in performance after a few epochs (model used in this study is epoch 50). (C) GMM held-out log likelihood as a function of the number of clusters used during model training. Seventy clusters were used in this study. (D) MMD² permutation comparisons. All family comparisons occur greater than expected by chance (p<0.01, independent t-test). (E) Number of latent features used by VAE.

**Figure 3.. Family specific vocal usage.**
(A) UMAP probability density plots (axes same as Figure 2B) show significant differences between family repertoires (p<0.01, MMD permutation test on latent space; see Methods). (B) GMM vocal cluster usage by family. Clusters sorted by cumulative usage across all families. Families show distinct usage patterns of different vocal clusters. (C) Clusters are resorted by the usage difference between families. (D) Spectrogram examples from top differentially used clusters (left) and location of clusters in embedding space (right).

**Figure 3—figure supplement 1.. Pup removal biases vocal repertoire usage.**
(A) Pup weaning causes a consistent reduction in vocal emission across families. (B) UMAP probability densities of the vocal repertoire pre and post pup weaning. Example vocalization from high-density post-weaning regions. (C). Difference in probability densities and total percent-change in repertoire pre-post pup weaning. (D) Quantification of day-to-day percent-change throughout the experiment shows that the percent-change magnitude observed in C is rare.

**Figure 3—figure supplement 2.. Acoustic features for GMM clusters.**
Acoustic features computed on the top 100 most probable vocalizations from each GMM cluster. Mean values ± standard deviation shown. Details on acoustic feature calculation are described in the Methods section.

**Figure 4.. Vocal usage differences remain stable across days of development.**
(A) UMAP probability density plots for each day of the recording, across families. Purple box indicated recording days that are shared across families. These days are used for subsequent analyses in **C-E**. (B) GMM vocal cluster usage per day. Usages are normalized on a per-day basis. A unique color is used for each cluster type. (C) PCA projection of daily usages within the purple (shared recording days) period showing that families use a unique subset of clusters stably across days. (D) Maximum Mean Discrepancy (MMD) distance between VAE latent distributions of vocalizations between days and across families. (E) Multidimensional scaling projection of MMD matrix from (D). Family vocal repertoires are distinct and remain so across days.

**Figure 4—figure supplement 1.. Family specific cluster usages do not depend on GMM cluster size.**
(A) GMM cluster usages for each family over a range of GMM cluster sizes. (B) Quantification of pairwise cluster usage differences showing stability of family differences over all cluster sizes.

**Figure 5.. Transition structure, but not emission structure, shows family specific differences.**
(A) Vocalizations are emitted in a diurnal cycle. (B) Vocalizations consistently occur in seconds-long bouts across families. (C) Vocalization intervals (onset-to-onset) are consistent across families. (D) Vocalization durations are consistent across families. (E) Raw data examples of bouts. (F) Bouts typically occupy a similar area of vocal space. (G) Vocal cluster transition matrix. Vocalizations strongly favor self-transition. (H) Bigram probability graph. Self and other vocalization transition tendencies show family specific transitions (edges > 0.001 usage shown).

**Figure 5—figure supplement 1.. Vocalization transitions are non-random and family specific.**
(A) Vocal cluster transition matrix (same as Figure 5G). (B) Random transition matrix, computed after shuffling vocal cluster label sequence. (C) Transitions that occur greater than expected by chance (1000-iteration random shuffle with one-sample t-test and post hoc Benjamini-Hochberg multiple comparisons correction; see Methods). (D) Most common transitions (>0.04% usage) from cluster 12 (roughly equally used across all families) to other clusters. Red lines indicate transitions that are shared across families, black lines indicate unique family specific transitions.

See this image and copyright information in PMC

Update of

Unsupervised discovery of family specific vocal usage in the Mongolian gerbil.
Peterson RE, Choudhri A, Mitelut C, Tanelus A, Capo-Battaglia A, Williams AH, Schneider DM, Sanes DH. Peterson RE, et al. bioRxiv [Preprint]. 2024 Sep 4:2023.03.11.532197. doi: 10.1101/2023.03.11.532197. bioRxiv. 2024. Update in: Elife. 2024 Dec 16;12:RP89892. doi: 10.7554/eLife.89892. PMID: 39282260 Free PMC article. Updated. Preprint.

References

1. Ågren G. Pair formation in the Mongolian gerbil. Animal Behaviour. 1984a;32:528–535. doi: 10.1016/S0003-3472(84)80291-2. - DOI
1. Ågren G. Incest avoidance and bonding between siblings in gerbils. Behavioral Ecology and Sociobiology. 1984b;14:161–169. doi: 10.1007/BF00299615. - DOI
1. Ågren G, Zhou Q, Zhong W. Ecology and social behaviour of Mongolian gerbils,Meriones unguiculatus, at Xilinhot, Inner Mongolia, China. Animal Behaviour. 1989a;37:11–27. doi: 10.1016/0003-3472(89)90002-X. - DOI
1. Ågren G, Zhou Q, Zhong W. Territoriality, cooperation and resource priority: hoarding in the Mongolian gerbil,Meriones unguiculatus. Animal Behaviour. 1989b;37:28–32. doi: 10.1016/0003-3472(89)90003-1. - DOI
1. Amaro D, Ferreiro DN, Grothe B, Pecka M. Source identity shapes spatial preference in primary auditory cortex during active navigation. Current Biology. 2021;31:3875–3883. doi: 10.1016/j.cub.2021.06.025. - DOI - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Unsupervised discovery of family specific vocal usage in the Mongolian gerbil

Affiliations

Unsupervised discovery of family specific vocal usage in the Mongolian gerbil

Authors

Affiliations

Abstract

Plain language summary

Conflict of interest statement

Figures

Update of

References

MeSH terms

Associated data

Grants and funding

LinkOut - more resources

Full Text Sources

Miscellaneous