Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Jul 11;13(1):11238.
doi: 10.1038/s41598-023-38186-7.

Extended performance analysis of deep-learning algorithms for mice vocalization segmentation

Affiliations

Extended performance analysis of deep-learning algorithms for mice vocalization segmentation

Daniele Baggi et al. Sci Rep. .

Abstract

Ultrasonic vocalizations (USVs) analysis represents a fundamental tool to study animal communication. It can be used to perform a behavioral investigation of mice for ethological studies and in the field of neuroscience and neuropharmacology. The USVs are usually recorded with a microphone sensitive to ultrasound frequencies and then processed by specific software, which help the operator to identify and characterize different families of calls. Recently, many automated systems have been proposed for automatically performing both the detection and the classification of the USVs. Of course, the USV segmentation represents the crucial step for the general framework, since the quality of the call processing strictly depends on how accurately the call itself has been previously detected. In this paper, we investigate the performance of three supervised deep learning methods for automated USV segmentation: an Auto-Encoder Neural Network (AE), a U-NET Neural Network (UNET) and a Recurrent Neural Network (RNN). The proposed models receive as input the spectrogram associated with the recorded audio track and return as output the regions in which the USV calls have been detected. To evaluate the performance of the models, we have built a dataset by recording several audio tracks and manually segmenting the corresponding USV spectrograms generated with the Avisoft software, producing in this way the ground-truth (GT) used for training. All three proposed architectures demonstrated precision and recall scores exceeding [Formula: see text], with UNET and AE achieving values above [Formula: see text], surpassing other state-of-the-art methods that were considered for comparison in this study. Additionally, the evaluation was extended to an external dataset, where UNET once again exhibited the highest performance. We suggest that our experimental results may represent a valuable benchmark for future works.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Figure 1
Figure 1
An example of USV segmentation.
Figure 2
Figure 2
Two examples of inconvenient situations emerging from a not post-processed segmentation. (a) some spikes are incorrectly detected prior and after the USV, while in (b) the main prediction is split into two regions because of an extremely short break inside. Both cases can be properly fixed by applying a post-processing algorithm based on mathematical morphology on the output of the models.
Figure 3
Figure 3
General framework for the USV segmentation. The spectrogram Si associated with an entire audio track is divided into shorter fragments sij. A classification model takes the fragments separately as input and returns for each of them a binary vector yij(p), the elements of which indicate if a USV is present at the corresponding time frames (white square) or not (black square). The vectors are then concatenated, generating the vector yi(p) associated with Si. Finally, a post-processing algorithm allows to remove potential isolated spikes or fill undesired small interruptions, generating this way zi(p). For the sake of visibility, the squares associated with the single time predictions are drawn larger with respect to the time resolution of the depicted spectrogram.
Figure 4
Figure 4
Flowchart of the three proposed models. The meaning of the blocks for all the 3 architectures is reported in the legend in (a). k, n and d specify the number of filters, the number of neurons and the dropout rate for the convolutional, dense and dropout layers, respectively.
Figure 5
Figure 5
Performance comparison between the models in terms of time frame classification, with and without taking into account the post-processing algorithm. (a) and (b) show the accuracy and precision-recall curves, respectively. (c-e) depict the confusion matrices associated with the models without post-processing. (fh) depict the confusion matrices associated with the models when implementing post-processing.
Figure 6
Figure 6
Training vocalization segmentation performance of the models at varying of the kernel size used in the post-processing.
Figure 7
Figure 7
Precision and recall curves of the models at varying of the IoU threshold with and without post-processing.
Figure 8
Figure 8
Histogram of detected USV calls (positives) with respect to the corresponding IoU value with post-processing.
Figure 9
Figure 9
Visual results comparison between the proposed models with the post-processing algorithm. The pictures show the spectrograms of two audio segments (the first segment is depicted in (ac) while the second one in (df) with the associated GT (blue signal) and prediction (orange signal) related to the three models.
Figure 10
Figure 10
Boxplots related to the length relative errors distributions.

References

    1. Holy T, Guo Z. Ultrasonic songs of male mice. PLoS Biol. 2005;3:e386. doi: 10.1371/journal.pbio.0030386. - DOI - PMC - PubMed
    1. Zippelius HM, Schleidt WM. Ultraschall-laute bei jungen mause. Naturwissenschaften. 1956;43:502–502. doi: 10.1007/BF00632534. - DOI
    1. D’Amato FR, Scalera E, Sarli C, Moles A. Pups call, mothers rush: Does maternal responsiveness affect the amount of ultrasonic vocalizations in mouse pups? Behav. Genet. 2005;35:103–112. doi: 10.1007/s10519-004-0860-9. - DOI - PubMed
    1. Panksepp JB, et al. Affiliative behavior, ultrasonic communication and social reward are influenced by genetic variation in adolescent mice. PLoS ONE. 2007;2:e351. doi: 10.1371/journal.pone.0000351. - DOI - PMC - PubMed
    1. Peleh T, Eltokhi A, Pitzer C. Longitudinal analysis of ultrasonic vocalizations in mice from infancy to adolescence: Insights into the vocal repertoire of three wild-type strains in two different social contexts. PLoS ONE. 2019;14:e0220238. doi: 10.1371/journal.pone.0220238. - DOI - PMC - PubMed