. 2024 Mar 8;7(1):291.

doi: 10.1038/s42003-024-05945-9.

Speech-induced suppression during natural dialogues

Joaquin E Gonzalez¹, Nicolás Nieto^{2

3}, Pablo Brusco⁴, Agustín Gravano^{5

6

7}, Juan E Kamienkowski^{8

4

9}

Affiliations

¹ Laboratorio de Inteligencia Artificial Aplicada, Instituto de Ciencias de la Computación (Universidad de Buenos Aires - Consejo Nacional de Investigaciones Cientificas y Tecnicas), Buenos Aires, Argentina. joaquin.gonzalez6693@gmail.com.
² Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional, sinc(i) (Universidad Nacional del Litoral - Consejo Nacional de Investigaciones Cientificas y Tecnicas), Santa Fe, Argentina.
³ Instituto de Matemática Aplicada del Litoral, IMAL-UNL/CONICET, Santa Fe, Argentina.
⁴ Departamento de Computación, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires, Buenos Aires, Argentina.
⁵ Laboratorio de Inteligencia Artificial, Universidad Torcuato Di Tella, Buenos Aires, Argentina.
⁶ Escuela de Negocios, Universidad Torcuato Di Tella, Buenos Aires, Argentina.
⁷ Consejo Nacional de Investigaciones Científicas y Técnicas, Buenos Aires, Argentina.
⁸ Laboratorio de Inteligencia Artificial Aplicada, Instituto de Ciencias de la Computación (Universidad de Buenos Aires - Consejo Nacional de Investigaciones Cientificas y Tecnicas), Buenos Aires, Argentina.
⁹ Maestria de Explotación de Datos y Descubrimiento del Conocimiento, Facultad de Ciencias Exactas y Naturales - Facultad de Ingenieria, Universidad de Buenos Aires, Buenos Aires, Argentina.

PMID: 38459110
PMCID: PMC10923813
DOI: 10.1038/s42003-024-05945-9

Speech-induced suppression during natural dialogues

Joaquin E Gonzalez et al. Commun Biol. 2024.

. 2024 Mar 8;7(1):291.

doi: 10.1038/s42003-024-05945-9.

Authors

Joaquin E Gonzalez¹, Nicolás Nieto^{2

3}, Pablo Brusco⁴, Agustín Gravano^{5

6

7}, Juan E Kamienkowski^{8

4

9}

Affiliations

¹ Laboratorio de Inteligencia Artificial Aplicada, Instituto de Ciencias de la Computación (Universidad de Buenos Aires - Consejo Nacional de Investigaciones Cientificas y Tecnicas), Buenos Aires, Argentina. joaquin.gonzalez6693@gmail.com.
² Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional, sinc(i) (Universidad Nacional del Litoral - Consejo Nacional de Investigaciones Cientificas y Tecnicas), Santa Fe, Argentina.
³ Instituto de Matemática Aplicada del Litoral, IMAL-UNL/CONICET, Santa Fe, Argentina.
⁴ Departamento de Computación, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires, Buenos Aires, Argentina.
⁵ Laboratorio de Inteligencia Artificial, Universidad Torcuato Di Tella, Buenos Aires, Argentina.
⁶ Escuela de Negocios, Universidad Torcuato Di Tella, Buenos Aires, Argentina.
⁷ Consejo Nacional de Investigaciones Científicas y Técnicas, Buenos Aires, Argentina.
⁸ Laboratorio de Inteligencia Artificial Aplicada, Instituto de Ciencias de la Computación (Universidad de Buenos Aires - Consejo Nacional de Investigaciones Cientificas y Tecnicas), Buenos Aires, Argentina.
⁹ Maestria de Explotación de Datos y Descubrimiento del Conocimiento, Facultad de Ciencias Exactas y Naturales - Facultad de Ingenieria, Universidad de Buenos Aires, Buenos Aires, Argentina.

PMID: 38459110
PMCID: PMC10923813
DOI: 10.1038/s42003-024-05945-9

Abstract

When engaged in a conversation, one receives auditory information from the other's speech but also from their own speech. However, this information is processed differently by an effect called Speech-Induced Suppression. Here, we studied brain representation of acoustic properties of speech in natural unscripted dialogues, using electroencephalography (EEG) and high-quality speech recordings from both participants. Using encoding techniques, we were able to reproduce a broad range of previous findings on listening to another's speech, and achieving even better performances when predicting EEG signal in this complex scenario. Furthermore, we found no response when listening to oneself, using different acoustic features (spectrogram, envelope, etc.) and frequency bands, evidencing a strong effect of SIS. The present work shows that this mechanism is present, and even stronger, during natural dialogues. Moreover, the methodology presented here opens the possibility of a deeper understanding of the related mechanisms in a wider range of contexts.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

**Fig. 1. Experimental design and analysis pipeline.**
a The participants were sitting facing each other, with an opaque curtain that prevented visual communication. They were presented with a screen with objects distributed on it, and a task that required verbal communication. Uttered speech by each participant was recorded using a microphone, and the brain activity was recorded using 128 EEG channels. b Audio signals from which the audio features were extracted (envelope orange) and used to first train an encoding model through Ridge regression fitted to the EEG of each channel. Then, the features were used as input to predict the EEG of those same channels. Pearson correlation between the predicted and the recorded EEG signal is used as a measure of the model’s performance.

**Fig. 2. Model performance while listening to external speech.**
a Correlation coefficients for every frequency band and their spatial distribution obtained from the spectrogram model. The correlation values were obtained for each electrode and averaged across the 5 folds within each participant, then averaged across participants. The top panel A shows the spatial distribution of the averaged correlation values, to better determine the regions where higher correlation is achieved. The distribution shown in the lower panel consists of those same values but presented in a violin plot, for an easier comparison across frequency bands. b Correlation distribution for left and right electrodes indicated in the topographic figure, for the models using spectrogram and envelope as input features. The electrodes were chosen as the 12 presenting higher correlation values in the frontal region for each hemisphere and a signed-rank Wilcoxon test was performed to compare the values obtained in each hemisphere (N = 12 independent samples). The correlation values for the spectrogram show a significant lateralization effect towards the left hemisphere, with a p-value of ~0.0005, whereas the envelope shows no significant difference (p-value ~ 0.38). Significance: n.s p-value > 0.05, *p-value < 0.001.

**Fig. 3. Theta band mTRFs to the audio spectrogram while listening to external speech.**
Panel a shows the mTRF for each electrode, averaged across participants and mel-bands. The position of each electrode is indicated by the scalp plot on the right. Panel b shows the mTRF to each of the spectrogram features (each mel-frequency band) averaging the responses over electrodes. Panel c shows the p-values in negative logarithmic scale from a TFCE test applied to the mel-frequency band mTRF of all subjects separately (N = 18 independent samples, d.f.: 17). The mTRFs represent the response in the EEG signal to each time lag of the audio features. The time axis represents the time elapsed between the audio feature being pronounced and the instant in the EEG signal being predicted. For representation purposes, pre-stim time lags are included in the figure, but the predictions were made only from positive times, to avoid providing the model with information from future time-points. Please see Supplementary Note 4 and Supplementary Fig. 6 for a detailed explanation of the time axis on these figures.

**Fig. 4. Speech-Induced Suppression: Correlation values of all frequency bands and Theta band mTRFs to the spectrogram for every dialog condition.**
a Listening to external speech (E); b Listening to self-produced speech (S); c Listening to the external speech while both are speaking (E∣B); d Listening to the self-produced speech while both are speaking (S∣B); e Silence. Mean number of samples per participant: NA/NB = 49,034 (range between [17,825–78,259]), NC/ND = 2692 ([1207–4617]), NE = 31,586 ([15,207–73,464]). Again, pre-stim time lags are included in the figure for representation purposes, but the predictions were made only from positive times, to avoid providing the model with information from future time-points.

Fig. 5. Comparisons between the average correlation values of each electrode in the Theta band from different listening conditions: Results from Wilcoxon signed-rank test, Cohen’s d-prime, and Bayes Factors in favor of the hypothesis H1 (BF₁₀), and in favor of the hypothesis H0 (BF₀₁) (N = 18, d.f.: 17).
a Comparison between isolated External speech, Self-produced speech and Silence. b Comparison between External speech and Self-produced speech when both participants are speaking, and Silence. c Comparison between isolated and both participants speaking conditions. The conditions are abbreviated as follows: Listening to external speech (E), Listening to self-produced speech (S), Listening to the external speech while both are speaking (E∣B), Listening to the self-produced speech while both are speaking (S∣B). Uncorrected p-values should be compared with a threshold of 0.05/128 = 3.9 × 10⁻⁴ (Bonferroni corrected-threshold), also see Supplementary Note 8 Supplementary Fig. 11 for False-Discovery Rate (FDR) corrected p-values.

**Fig. 6. Phase-locking value (PLV) between the EEG signal of each electrode and the envelope signal, averaged across participants for all dialog conditions.**
a Listening to external speech (E); b Listening to self-produced speech (S); c Listening to the external speech while both are speaking (E∣B); d Listening to the self-produced speech while both are speaking (S∣B); e Silence. Each panel shows the phase synchronization between each EEG channel and the envelope feature (top left), and the average values and standard deviation across channels (bottom left), for all time-lags between −200 ms and 400 ms. The time lag 0 corresponds to the EEG and envelope from matching instants, negative latencies indicate that the EEG signal precedes the auditory signal (making it impossible to have a causal effect), while positive lags indicate that the brain activity follows the auditory signal. On the right side of every panel, the topographic distribution of phase-locking values for the time lag of maximum average synchronization.

See this image and copyright information in PMC

References

1. Matin E. Saccadic suppression: a review and an analysis. Psychol. Bull. 1974;81:899. doi: 10.1037/h0037368. - DOI - PubMed
1. Blakemore S-J, Wolpert DM, Frith CD. Central cancellation of self-produced tickle sensation. Nat. Neurosci. 1998;1:635–640. doi: 10.1038/2870. - DOI - PubMed
1. Thiele A, Henning P, Kubischik M, Hoffmann K-P. Neural mechanisms of saccadic suppression. Science. 2002;295:2460–2462. doi: 10.1126/science.1068788. - DOI - PubMed
1. Hughes G, Desantis A, Waszak F. Mechanisms of intentional binding and sensory attenuation: the role of temporal prediction, temporal control, identity prediction, and motor prediction. Psychol. Bull. 2013;139:133. doi: 10.1037/a0028566. - DOI - PubMed
1. Curio G, Neuloh G, Numminen J, Jousmäki V, Hari R. Speaking modifies voice-evoked activity in the human auditory cortex. Hum. Brain Mapp. 2000;9:183–191. doi: 10.1002/(SICI)1097-0193(200004)9:4<183::AID-HBM1>3.0.CO;2-Z. - DOI - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Speech-induced suppression during natural dialogues

Affiliations

Speech-induced suppression during natural dialogues

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources