Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Aug 30;43(35):6141-6163.
doi: 10.1523/JNEUROSCI.2353-22.2023. Epub 2023 Aug 4.

Predictive Mouse Ultrasonic Vocalization Sequences: Uncovering Behavioral Significance, Auditory Cortex Neuronal Preferences, and Social-Experience-Driven Plasticity

Affiliations

Predictive Mouse Ultrasonic Vocalization Sequences: Uncovering Behavioral Significance, Auditory Cortex Neuronal Preferences, and Social-Experience-Driven Plasticity

Swapna Agarwalla et al. J Neurosci. .

Abstract

Mouse ultrasonic vocalizations (USVs) contain predictable sequential structures like bird songs and speech. Neural representation of USVs in the mouse primary auditory cortex (Au1) and its plasticity with experience has been largely studied with single-syllables or dyads, without using the predictability in USV sequences. Studies using playback of USV sequences have used randomly selected sequences from numerous possibilities. The current study uses mutual information to obtain context-specific natural sequences (NSeqs) of USV syllables capturing the observed predictability in male USVs in different contexts of social interaction with females. Behavioral and physiological significance of NSeqs over random sequences (RSeqs) lacking predictability were examined. Female mice, never having the social experience of being exposed to males, showed higher selectivity for NSeqs behaviorally and at cellular levels probed by expression of immediate early gene c-fos in Au1. The Au1 supragranular single units also showed higher selectivity to NSeqs over RSeqs. Social-experience-driven plasticity in encoding NSeqs and RSeqs in adult females was probed by examining neural selectivities to the same sequences before and after the above social experience. Single units showed enhanced selectivity for NSeqs over RSeqs after the social experience. Further, using two-photon Ca2+ imaging, we observed social experience-dependent changes in the selectivity of sequences of excitatory and somatostatin-positive inhibitory neurons but not parvalbumin-positive inhibitory neurons of Au1. Using optogenetics, somatostatin-positive neurons were identified as a possible mediator of the observed social-experience-driven plasticity. Our study uncovers the importance of predictive sequences and introduces mouse USVs as a promising model to study context-dependent speech like communications.SIGNIFICANCE STATEMENT Humans need to detect patterns in the sensory world. For instance, speech is meaningful sequences of acoustic tokens easily differentiated from random ordered tokens. The structure derives from the predictability of the tokens. Similarly, mouse vocalization sequences have predictability and undergo context-dependent modulation. Our work investigated whether mice differentiate such informative predictable sequences (NSeqs) of communicative significance from RSeqs at the behavioral, molecular, and neuronal levels. Following a social experience in which NSeqs occur as a crucial component, mouse auditory cortical neurons become more sensitive to differences between NSeqs and RSeqs, although preference for individual tokens is unchanged. Thus, speech-like communication and its dysfunction may be studied in circuit, cellular, and molecular levels in mice.

Keywords: auditory cortex; c-fos; experience-dependent plasticity; mouse USV; sequence; somatostatin.

PubMed Disclaimer

Conflict of interest statement

The authors declare not competing financial interests.

Figures

None
Naive female mice prefer NSeqs emitted by male mice during social exposure. A, Schematic depicting the three social contexts (Alone, Separated, and Together) in which mouse vocalizations were recorded. B, Examples of spectrograms of representative syllable types. Different syllable types are depicted with different color bars. C, Probability distribution of the different syllable types in the three contexts shown as bars with percentage of syllables. Right, Diagonal matrix quantifies the KLD distance among the distributions at 95% CIs (in bits). D, Joint probability distributions of syllable-to-syllable transition considering starting two syllables in bouts is depicted in each of the first three matrices in the row for the three contexts of the adult male (Alone, Separated, Together). Right, The diagonal matrix quantifies the KLD among the joint distributions. E, The three distributions depict the ISSs observed in each context of adult male. The vertical dashed line (at 250 ms) marks mean + 2 * SD of the overall data. F, The distribution of percentage of bouts of a particular length present in each of the contexts is shown. G, Plots of MI, I(S1; Si) with i = 1,2 ... 10 in the three different contexts in blue with 95% CIs. Red plots, with 95% CIs, show the expected extent of 0 MI estimate from the data after scrambling the order of syllables. Lack of overlap of the CIs (red and blue) indicate significant MI. H, The three matrices represent the MI calculated as in D with each row showing the MI for the nth syllable with the first (row 1), second (row 2), third (row 3), and so on. The diagonal elements show the entropy of the syllable in the corresponding position from the bout start. Asterisks indicate significance at 95% confidence.
Figure 2.
Figure 2.
Naive adult female mice prefer NSeqs over RSeqs. A, Spectrogram of the syllables used for stimulus design. B, The set of sequences created for RSeq and extracted for NSeq are depicted with the color bars as shown in A. Sequences numbered 01–12 are the RSeq. The light blue background in a subset of the RSeqs indicates the sequences for the same length case chosen. The NSeqs in three different contexts (numbered 13–20) are identified above each NSeq as Alone, Separated (SEP), and Together (TOG). C, Top row, S1–S4 shows the four sessions recorded with S1 and S3 with no sounds played. S2 and S4 have sounds played from speakers as indicated. Bottom row, The two bar plots to the left of the dashed line show bars indicating time spent on the side of NSeqs and RSeqs in S2 and S4 in equal length case. The same for the RSeq with seven syllables is shown to the right of the dashed line. Each animal is assigned the same color in S2 and S4. The black line represents the overall mean ± SEM of the population data; *p < 0.05, **p < 0.01, ***p < 0.001; NS, Not significant.
None
Naive adult female mice show higher activation of c-fos+ cells for NSeqs over RSeqs. A, Schematic depicting the three stages followed to investigate activation of c-fos+ cells. Throughout the experiment up to killing, the mouse remained in a sound-attenuating chamber; Stage 1, 60 min of habituation; Stage 2, 15 min of exposure to auditory stimulus (RSeqs or NSeqs or silence for Ctrl); Stage 3, 60 min of consolidation in silence. B, Representative coronal brain sections for c-fos expression from different groups, Ctrl (top), Ran (middle), and Nat (bottom), with ACX subregions Au1, AuV, AuD, and visual cortex areas (V1 and V2L as control) demarcated with dotted lines. Right, Enlarged images show sampled Au1 regions with c-fos+ cells (top, white arrow, red), corresponding DAPI-stained nuclei identification (middle, white arrows, blue), and the overlay of the two (bottom). C, Bar plots show average count of c-fos+ cells/mm2 in three conditions, Ctrl, Ran, and Nat, for V1. V2L, and total ACX, subregions of ACX (Au1, AuD, AuV). Quantitative differences of c-fos+ cells for each of the cortical subregions are marked with statistical significance at 5% significance level using a one-way ANOVA; *p < 0.05, **p < 0.01, ***p < 0.001; NS, Not significant.
Figure 4.
Figure 4.
Differential coding of single syllables in Au1 for NSeqs and RSeqs. A, Representative dot raster plot of single-unit spiking responses to NSeqs and RSeqs presented in pseudorandom order (right, spike shape). Smoothed PSTHs (binning of 10 ms) of the same unit for each stimulus sequence is shown with stimulus start (tall vertical line) followed by lines marking start and end of each subsequent syllable. B, Schematic for calculation of responses to common syllables within the sequences in NSeq and RSeq, not considering the syllables in the starting position. Different color bars represent different syllable types, and the width of color bars is indicative of syllable duration. The block dots are the spikes corresponding to each iteration (REP1, REP2…REPN). The mean responses of the syllables with the same color (indicated by colored space on the schematic of spikes over n repetitions) were calculated for RSeqs (Baseline, left) and NSeqs (Baseline, right) for each of the different common syllable types. C, Scatter plots show comparison of mean response rates for all common first syllables in NSeqs and RSeqs in three groups of mice, Awk_F, Anes_F, and Anes_M. D, Scatter plots show comparison of mean response rates of all common first (identified by solid symbols (S, large circle; H, square; O, triangle; syllable types in NSeq and RSeq in 3 groups of mice, Awk_F, Anes_F and Anes_M). E, Scatter plots comparing mean response rates of common syllables (identified by solid symbols (S, large circle; J, small circle; H, square; O, triangle) in NSeqs and RSeqs, excluding occurrence in the first position for the same groups in C. F, Profile of changes in the normalized response strength for each position of the sequences NSeq (blue) and RSeq (red) over time, normalized by response to the syllable in the starting position. None of the profiles at any position showed any significant difference (one-way ANOVA). Each dot in the scatter plots (C, D, E) represents an individual neuron; *p < 0.05, **p < 0.01, ***p < 0.001; NS, Not significant.
Figure 5.
Figure 5.
Differential coding of disyllables in Au1 of mice for NSeqs and RSeqs. A, Schematic for calculation of mean rate responses to transitions/disyllables based on response to the second component, excluding the first transition. Each color bar represents a different syllable type; the width is representative of syllable duration. The black dots stand for the spikes correspond to each iteration (REP1, REP2…REPN). Right, The common transitions. Left, The color lines used for showing the transitions, namely, green, blue, and cyan, have been correspondingly used for denoting the spikes over iterations for those transitions. Mean spike rates were computed for each of the same colored areas indicated in the schematic of the spikes over N repetitions for RSeqs (Baseline, left) and NSeqs (Baseline, right) for each of the common transitions. B–D, Representative example PSTHs (vertical gray bar indicates start of a sequence, red line along the x-axis represents the stimulus duration) and scatter plots comparing mean response rates of common disyllables in NSeqs and RSeqs for the three groups, Awk_F (B), Anes_F (C), and Anes_M (D). E–G, Scatter plot of comparison between mean rate responses to first transition in NSeqs (S–J, S–H, S–O, and O–H) and the same transition present at any position in RSeqs based on response to the second component, excluding the first transition for the groups AwK_F (E), Anes_F (F), and Anes_M (G); *p < 0.05, **p < 0.01, ***p < 0.001; NS, Not significant.
Figure 6.
Figure 6.
Coding of syllables in Au1 remains unaltered after social experience. A, Schematic of social exposure protocol of female mice with male mice over days. B, Scatter plot for comparison of mean response rates to the common first syllables in NSeqs and RSeqs, following exposure in anesthetized females (Anes_F-Aft_Expo). C, Scatter plots show comparison of mean response rates of all common (solid symbols, S, large circle; H, square; O, triangle) first syllable types in NSeqs and RSeqs for Anes_F-Aft_Expo. D, Representative PSTH for an experienced female (vertical gray bar represents the onset of the stimulus, the red line along the x-axis represents the duration of the sequence). E, F, Scatter plot to compare mean rate responses to common syllables (E) and disyllables (F) for anesthetized females (Anes_F-Aft_Expo) as in Figure 4E and Figure 5B–D, respectively. G, Scatter plot of mean rate responses to first transition in NSeqs (S–J, S–H, S–O, and O–H) based on response to the second component, excluding the first transition. H, Profile of changes in the normalized response strength for each position of the sequences NSeq (blue) and RSeq (red) over time with respect to the syllable in the starting position; not significant (NS) across position, one-way ANOVA; *p < 0.05, **p < 0.01, ***p < 0.001; NS.
Figure 7.
Figure 7.
Plasticity observed in single-unit selectivity to entire the NSeq and not its components. A–C, Comparison of mean overall selectivity in NSeqs to that in RSeqs and comparisons across groups before exposure (Awk_F-Bef-Expo, Anes_F-Bef-Expo, and Anes_M-Bef-Expo) and after exposure (Anes_F-Aft_Expo), monosyllables (A), disyllables (B), and overall sequences (C). D, The mean selectivity to NSeqs and RSeqs in anesthetized females before exposure (Anes_F-Bef_Expo) is compared with selectivity to NSeqs and RSeqs over days of exposure and within days of exposure (Anes_F-Aft_Expo); *p < 0.05, **p < 0.01, ***p < 0.001; NS, Not significant.
Figure 8.
Figure 8.
Responses to sequences obtained with two-photon Ca2+ imaging of Thy1, SOM, and PV neurons. A, Representative examples of tonotopy in Au1 and other auditory areas in mouse Au1 obtained in all three groups of mice (Thy-1-GCamp, SOM-GCamp, and PV-GCamp, respectively, in three columns, separated by dashed lines) are shown. White box marks the area shown in B. B, Sample two-photon image of an ROI in A1 in each of the three groups of mice. C, Average df/f plots obtained with two-photon imaging, in response to each of the 20 stimuli for two cells in each ROI (Cell A1, B1, C1 and Cell A2, B2, C2, respectively). D, Left, Bar graphs show percentage of single units (blue) with significant rate responses to each stimulus (1–20; Fig. 2B) in Awk_F-Bef-Expo and that of single Thy-1-positive EXNs (brown) with significant responses in Ca2+ in EXN-Bef-Expo. Right, Bar graphs show (same color representation as left) percentage of neurons responding to number of stimuli either 1, 2, or all 20 of the stimuli; the stimuli identity (id) doesn't matter.
Figure 9.
Figure 9.
Differential effects of social-experience-driven plasticity in EXNs and SOM and not in PV. A–C, Population data with two-photon imaging in Thy1-GCamp (A), SOM-GCamp (B), and PV-GCamp (C) mice, with two matrix plots for before (left) and after (right) exposure. The rows in each matrix represent the percentage of neurons in the group and condition that respond to none (0), one, two, all (8) of the NSeq (x-axis) and of the neurons that respond to none (0), one, two, all (12) of the RSeq (y-axis). Right, Marginal distributions show the number of neurons responding to the different number of RSeqs. Distribution at the bottom of each matrix plot shows the average of the rows. Comparison of average selectivity to NSeqs and RSeqs in each condition Bef_Expo and Aft_Expo and between the conditions for each neuronal type (D) Thy-1; (E) SOM; (F) PV; *p < 0.05, **p < 0.01, ***p < 0.001; NS, Not significant.
Figure 10.
Figure 10.
Reversible silencing of SOM paired with sequence presentation mimics plasticity in sequence selectivity without altering syllable selectivity. A, Schematic for optogenetic silencing of SOM using a laser of 589 nm (middle). Recordings were made in Au1 using a multielectrode array with an optical fiber in a mouse model expressing ArchT-EGFP in only SOM neurons. Right, Representative image of brain sections collected postrecording is shown with some SOM-ArchT-EGFP-positive neurons marked with white arrows. B, Determining laser power. For all such experiments the laser was turned on 100 ms prestimulus onset and turned off 100 ms stimulus offset (orange shading). Spontaneous or nonauditory-driven activity in three periods were used, OFF1, ON and OFF2, each 100 ms long and were as depicted. PSTH of a neuron is shown below the schematic. For sequences stimulus onset and offset were onset of the first syllable and offset of the last syllable of the sequence. C, Histograms of modulation of spontaneous activity of all cases show significant modulation of spontaneous spiking by light (left and middle, mean, red arrow), and the histogram to the right shows comparisons of spontaneous activity before and after light on. D, Representative example of a neuron with optogenetics for the sequences. E, Scatter plot to compare single syllable mean responses in NSeqs and RSeqs during SOM silencing paired with sequences (Anes_F-Opto) and after the period of pairing (Anes_F-Aft_Opto) as in Figures 4E and 6E. F, G, Scatter plots comparing mean response rates of common disyllables excluding occurrence in the first position for the Anes_F-Opto and Anes_F-Aft_Opto groups. Similar plot as in Figure 5C, with Anes_F-Bef_Expo and Anes_F-Aft_Expo (from Fig. 6F). H, Comparison of mean overall selectivity in NSeqs to that in RSeqs and comparisons across groups similar to Figure 7, C and D; *p < 0.05, **p < 0.01, ***p < 0.001; NS, Not significant.
Figure 11.
Figure 11.
Differential c-fos+ cell activation in MGV, MGD, IC, and TEA. A, Representative coronal section for c-fos+ cells in the auditory thalamus (MGD and MGV) and TEA (dotted lines). B, Representative images of the sampled locations from MGV for c-fos+ cells in three different contexts, Control, RSeq, and NSeq. White arrows mark some c-fos+ cells in red, cell nucleus (DAPI stained) in blue, and overlaid cells in purple in the sampled images. C, Sample images from TEA from the above mentioned three experimental conditions show c-fos+ cells in red, cell nucleus in blue, and overlaid cells in purple. White arrow indicates c-fos+ cells. D, Example section from IC region with cortical and subcortical areas marked in dotted lines; DCIC: Dorsal cortex of the IC (DCIC); ECIC, external cortex; 2Cb, second cerebellar lobule. c-fos+ Cells are quantified from CIC (bilateral) from animals exposed in the above mentioned experimental contexts. Example c-fos+ cells are marked with arrows (c-fos+ in red, DAPI in blue, and overlaid in purple). E, Quantitative representation of the c-fos expression in the three different conditions are presented with bar diagrams. F, Comparative quantification of c-fos+ cells in Ran and Nat conditions for different regions after subtracting the activation for each region in silence; *p < 0.05, **p < 0.01, ***p < 0.001, ****p < 0.0001; NS, Not significant.

References

    1. Agarwalla S, Arroyo NS, Long NE, O’Brien WT, Abel T, Bandyopadhyay S (2020) Male-specific alterations in structure of isolation call sequences of mouse pups with 16p11.2 deletion. Genes Brain Behav 19:e12681. - PMC - PubMed
    1. Arriaga G, Jarvis ED (2013) Mouse vocal communication system: are ultrasounds learned or innate? Brain Lang 124:96–116. 10.1016/j.bandl.2012.10.002 - DOI - PMC - PubMed
    1. Arriaga G, Zhou EP, Jarvis ED (2012) Of mice, birds, and men: the mouse ultrasonic song system has some features similar to humans and song-learning birds. PLoS One 7:e46610. 10.1371/journal.pone.0046610 - DOI - PMC - PubMed
    1. Bandyopadhyay S, Young ED (2004) Discrimination of voiced stop consonants based on auditory nerve discharges. J Neurosci 24:531–541. 10.1523/JNEUROSCI.4234-03.2004 - DOI - PMC - PubMed
    1. Bandyopadhyay S, Shamma SA, Kanold PO (2010) Dichotomy of functional organization in the mouse auditory cortex. Nat Neurosci 13:361–368. 10.1038/nn.2490 - DOI - PMC - PubMed

Publication types

LinkOut - more resources