Formant-invariant voice and pitch representations are pre-attentively formed from constantly varying speech and non-speech stimuli

Giuseppe Di Dona¹, Michele Scaltritti¹, Simone Sulpizio^{2

3}

Affiliations

¹ Dipartimento di Psicologia e Scienze Cognitive, Università degli Studi di Trento, Trento, Italy.
² Dipartimento di Psicologia, Università degli Studi di Milano-Bicocca, Milano, Italy.
³ Milan Center for Neuroscience (NeuroMi), Università degli Studi di Milano-Bicocca, Milano, Italy.

PMID: 35673798
PMCID: PMC9545905
DOI: 10.1111/ejn.15730

Formant-invariant voice and pitch representations are pre-attentively formed from constantly varying speech and non-speech stimuli

Giuseppe Di Dona et al. Eur J Neurosci. 2022 Aug.

. 2022 Aug;56(3):4086-4106.

doi: 10.1111/ejn.15730. Epub 2022 Jun 23.

Authors

Giuseppe Di Dona¹, Michele Scaltritti¹, Simone Sulpizio^{2

3}

Affiliations

¹ Dipartimento di Psicologia e Scienze Cognitive, Università degli Studi di Trento, Trento, Italy.
² Dipartimento di Psicologia, Università degli Studi di Milano-Bicocca, Milano, Italy.
³ Milan Center for Neuroscience (NeuroMi), Università degli Studi di Milano-Bicocca, Milano, Italy.

PMID: 35673798
PMCID: PMC9545905
DOI: 10.1111/ejn.15730

Abstract

The present study investigated whether listeners can form abstract voice representations while ignoring constantly changing phonological information and if they can use the resulting information to facilitate voice change detection. Further, the study aimed at understanding whether the use of abstraction is restricted to the speech domain or can be deployed also in non-speech contexts. We ran an electroencephalogram (EEG) experiment including one passive and one active oddball task, each featuring a speech and a rotated speech condition. In the speech condition, participants heard constantly changing vowels uttered by a male speaker (standard stimuli) which were infrequently replaced by vowels uttered by a female speaker with higher pitch (deviant stimuli). In the rotated speech condition, participants heard rotated vowels, in which the natural formant structure of speech was disrupted. In the passive task, the mismatch negativity was elicited after the presentation of the deviant voice in both conditions, indicating that listeners could successfully group together different stimuli into a formant-invariant voice representation. In the active task, participants showed shorter reaction times (RTs), higher accuracy and a larger P3b in the speech condition with respect to the rotated speech condition. Results showed that whereas at a pre-attentive level the cognitive system can track pitch regularities while presumably ignoring constantly changing formant information both in speech and in rotated speech, at an attentive level the use of such information is facilitated for speech. This facilitation was also testified by a stronger synchronisation in the theta band (4-7 Hz), potentially pointing towards differences in encoding/retrieval processes.

Keywords: MMN; P3b; Theta; speech perception; voice representation.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Figures

**FIGURE 1**
Behavioural results of the active oddball task. (a) Proportion of correct responses broken down by condition (first column) and by probability (second column). (b) Reaction times of correct responses to deviant events only. Error bars represent the SE, and grey points represent individual observations. For illustrative purposes, only the relevant portion of the y‐axis is shown in both plots (dashed lines indicate the discontinuity of the axis).

**FIGURE 2**
Event‐related potential (ERP) results. (a) Passive oddball task. The first column displays the ERPs for control (dotted lines), deviant (dashed lines) and differential waveforms (continuous lines) at a representative channel (Fz) for the speech (blue lines) and the rotated speech condition (red lines). The grey rectangles indicate the time window used in the analyses (mismatch negativity [MMN], first row; late discriminative negativity [LDN], second row). In the subsequent columns, topographies show the spatial distribution of the MMN (first row) and LDN (second row) in the time windows where significant differences emerged. The last column represents the voltage difference between conditions, calculated by subtracting the differential waveforms in the rotated speech condition from the ones calculated in the speech condition. Electrodes that were included in the clusters for more than 50% of the samples within the cluster time windows (reported below the topographies) are represented by black asterisk marks superimposed to the maps. (b) Active oddball task. The first column represents the ERPs for standard (dotted lines), deviant (dashed lines) and differential waveforms (continuous lines) at a representative channel (CPz) for the speech (blue lines) and the rotated speech condition (red lines). In the subsequent columns, topographies show the spatial distribution of the differential P300 waveforms, calculated by subtracting the standard ERP from the deviant ERP in the time windows where significant differences emerged for each condition. The last column represents the voltage difference between conditions, calculated by subtracting the differential waveforms in the rotated speech condition from the ones calculated in the speech condition. Electrodes are marked as in A.

**FIGURE 3**
Time‐frequency results for the passive (first row) and the active (second row) oddball tasks. The time‐frequency power spectra show the power modulations (% change) characterising the differential event‐related spectral perturbations (ERSPs) for each condition (first and second columns) as well as the difference between them, corresponding to the interaction effect (third column). Spectra were obtained by averaging activity for the electrodes F5, F3, F1, Fz, F2, F4, F6, FC5, FC3, FC1, FCz, FC2, FC4, FC6, C5, C3, C1, Cz, C2, C4, C6, CP5, CP3, CP1, CPz, CP2, CP4, CP6, P5, P3, P1, Pz, P2, P4, P6, PO5, PO3, PO1, POz, PO2, PO4, PO6. In the plot for power spectra, black squares represent the temporal distribution of the significant clusters within theta (4–7 Hz) and beta (13–30 Hz) bands. The mean number of channels included in each cluster represented in the power spectra was calculated across all time samples, and only the time bins including at least half of the mean number of channels are enclosed in black squares. Topographies in the lower and higher row show the spatial distribution of theta and beta event‐related desynchronisations (ERDs)/event‐related synchronisations (ERSs) characterising the differential ERSPs for each condition (first and second columns) as well as the difference between them, corresponding to the interaction effect (third column). Electrodes that were included in the clusters for more than 50% of the samples within the cluster time windows (reported below each topography) are represented by black asterisk marks superimposed to the maps. Black squares on topographies represent the channels that were included in the averaged spectral plots.

See this image and copyright information in PMC

References

1. Aaltonen, O. , Eerola, O. , Lang, A. H. , Uusipaikka, E. , & Tuomainen, J. (1994). Automatic discrimination of phonetically relevant and irrelevant vowel parameters as reflected by mismatch negativity. The Journal of the Acoustical Society of America, 96(3), 1489–1493. 10.1121/1.410291 - DOI - PubMed
1. Assmann, P. F. , & Nearey, T. M. (2007). Relationship between fundamental and formant frequencies in voice preference. The Journal of the Acoustical Society of America, 122(2), EL35–EL43. 10.1121/1.2719045 - DOI - PubMed
1. Azadpour, M. , & Balaban, E. (2008). Phonological representations are unconsciously used when processing complex, non‐speech signals. PLoS ONE, 3(4), e1966. 10.1371/journal.pone.0001966 - DOI - PMC - PubMed
1. Backus, A. R. , Schoffelen, J.‐M. , Szebényi, S. , Hanslmayr, S. , & Doeller, C. F. (2016). Hippocampal‐prefrontal Theta oscillations support memory integration. Current Biology, 26(4), 450–457. 10.1016/j.cub.2015.12.048 - DOI - PubMed
1. Bardouille, T. , & Bailey, L. (2019). Evidence for age‐related changes in sensorimotor neuromagnetic responses during cued button pressing in a large open‐access dataset. NeuroImage, 193, 25–34. 10.1016/j.neuroimage.2019.02.065 - DOI - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Formant-invariant voice and pitch representations are pre-attentively formed from constantly varying speech and non-speech stimuli

Affiliations

Formant-invariant voice and pitch representations are pre-attentively formed from constantly varying speech and non-speech stimuli

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Miscellaneous