Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 May 28:5:10237.
doi: 10.1038/srep10237.

VoICE: A semi-automated pipeline for standardizing vocal analysis across models

Affiliations

VoICE: A semi-automated pipeline for standardizing vocal analysis across models

Zachary D Burkett et al. Sci Rep. .

Abstract

The study of vocal communication in animal models provides key insight to the neurogenetic basis for speech and communication disorders. Current methods for vocal analysis suffer from a lack of standardization, creating ambiguity in cross-laboratory and cross-species comparisons. Here, we present VoICE (Vocal Inventory Clustering Engine), an approach to grouping vocal elements by creating a high dimensionality dataset through scoring spectral similarity between all vocalizations within a recording session. This dataset is then subjected to hierarchical clustering, generating a dendrogram that is pruned into meaningful vocalization "types" by an automated algorithm. When applied to birdsong, a key model for vocal learning, VoICE captures the known deterioration in acoustic properties that follows deafening, including altered sequencing. In a mammalian neurodevelopmental model, we uncover a reduced vocal repertoire of mice lacking the autism susceptibility gene, Cntnap2. VoICE will be useful to the scientific community as it can standardize vocalization analyses across species and laboratories.

PubMed Disclaimer

Figures

Figure 1
Figure 1. Assignment and quantification of clustered birdsong syllables.
(a) Mature zebra finches (>120d) sing stereotyped song composed of repeated syllables that form motifs that form bouts. Shown are two song bouts sung by the same adult bird during two recording epochs (‘Session A’ and ‘Session B’). (Scale bar = 250 msec.) (b) Dendrogram plots global similarity distance between leaves (syllables) and was generated following spectral similarity scoring. Beneath the branches, clusters before (Unmerged) and after merging (Merged) are denoted by color bands. Representative syllables from merged clusters are illustrated at descending percentiles following correlation of each cluster member to the cluster eigensyllable. The Pearson’s rho for the correlation between each syllable and its eigensyllable are displayed in white. (c) During assignment, one of three possible outcomes for each syllable occurs: automatic assignment to a cluster (ASSIGNMENT), manual assignment in a tiebreaking procedure when statistically similar to two clusters (TIE), or categorization as novel (NOVEL). Artificially introduced syllables from a Bengalese finch did not pass a global similarity floor and are accurately deemed ‘novel’. Bars indicate the mean percentage global similarity between the syllable and each cluster. (d) The two artificially introduced syllables from a Bengalese finch, are, upon merging (Merged), appropriately assigned to two novel clusters. (e) Syntaxes are highly similar between recording sessions, regardless of metric used for scoring (left, ‘unmodified’) but the artificial introduction of novel syllables to the second recording session reduces similarity when using a metric that penalizes for novel syllables (right, ‘modified’). (f) Pitch (top) and entropy (bottom) are largely unchanged between recording sessions. (* = p < 0.05, resampling independent mean differences. Cluster colors are consistent throughout. Scale bars = 50 msec.)
Figure 2
Figure 2. VoICE detects deafening-induced alterations in song phonology and syntax.
(a) Spectrograms reveal song deterioration in deafened, but not sham-deafened, birds. (b) Syllables are assigned in a temporally-reversed serial manner to account for ongoing changes in syllable structure. (c) Syllable entropy, a measure of spectral ‘noise’, increases in a majority of syllables after deafening. Asterisks denote statistically significant changes from before surgery (left). Bar plots represent Pre (Day 0) vs. Post* (the first day statistically significantly different from ‘Pre’) vs. Post (the last analyzed day) recordings. Each symbol and line (left) and its corresponding pair of bars (right) represent a syllable cluster (right). (One-way resampling ANOVA, multiple comparisons post-hoc Bonferroni corrected p-value < 0.05) (d) Syntax similarity to pre-surgery decreases following deafening. (Black = sham; blue, red = deaf, * = p < 0.05 resampling independent mean differences. Scale bars = 250 msec in a and b.)
Figure 3
Figure 3. Validation of USV technique and comparison to manual classification standard.
(a) Exemplar USVs from a mouse on the C57BL/6J background at P7. (Scale bar = 200 msec.) (b) A dendrogram generated following spectral similarity scoring of USVs where calls are represented as leaves and branch points indicate the difference in weighted correlation between leaves. Beneath the branches, clusters automatically determined by the tree-trimming algorithm are denoted by unique color bands and illustrated by representatives at descending percentiles following correlation of each cluster member to the cluster eigencall. The Pearson’s rho for the correlation between each syllable and the eigencall are displayed in white. (c) Bar plots indicate the count of each call type when the classification is performed manually (white) or using VoICE (black). Pie charts, right, illustrate the percentage distribution of each call type for the same animal’s repertoire as determined by manual sorting or using VoICE. (Scale bar = 10 msec.)
Figure 4
Figure 4. Deletion of Cntnap2 results in altered vocal phenotype.
(a) Mouse pups lacking Cntnap2 (n = 15) do not call as much as WT littermates (n=13) (* = p < 0.05, resampling independent mean differences) (b) Expected counts of each call type (bars) generated from resampled WT data and 95% confidence intervals (red cross-hatch) reveal significant differences when compared to actual KO call counts, represented by overlaid points. Average counts of actual KO calls are represented as asterisks where p < 0.05. Error bars denote ±s.e.m. (c) Pie charts display the distribution of each call type in WT and KO animals. (Color scheme denoted beneath bars in b) (d) Heatmaps denote the correlation of repertoire within each genotype. KO animals show an intragenotype correlation greater than that of WT. Rows and columns represent animals, and indices are repertoire correlations between them. (e) Repertoire correlation is significantly greater within the KO genotype. (f) Heatmaps of the within- and across-genotype weighted unpenalized syntactical similarity scores show no within-genotype difference in syntax similarity. Rows and columns represent animals, and indices are syntax similarity scores between them. (g) Syntax entropy scores (a measure of call transition variability) within each genotype are similar.
Figure 5
Figure 5. Summary of procedures.
Flow charts describe the analytical pipeline for (a) zebra finch and (b) mouse USV analyses. Steps at which user input occurs are shaded in gray. (Animal photographs by NFD.)

References

    1. Doupe A. J. & Kuhl P. K. Birdsong and human speech: common themes and mechanisms. Annu Rev Neurosci 22, 567–631 (1999). - PubMed
    1. Brainard M. S. & Doupe A. J. Translating birdsong: songbirds as a model for basic and applied medical research. Annu Rev Neurosci. 36, 489–517 (2013). - PMC - PubMed
    1. Mahrt E. J., Perkel D. J., Tong L., Rubel E. W. & Portfors C. V. Engineered deafness reveals that mouse courtship vocalizations do not require auditory experience. J Neurosci. 33, 5573–5583 (2013). - PMC - PubMed
    1. Day N. F. & Fraley E. R. Insights from a nonvocal learner on social communication. J Neurosci. 33, 12553–12554 (2013). - PMC - PubMed
    1. Brenowitz E. A., Margoliash D. & Nordeen K. W. An introduction to birdsong and the avian song system. J Neurobiol. 33, 495–500 (1997). - PubMed

Publication types