. 2021 Jan 12;11(1):489.

doi: 10.1038/s41598-020-79922-7.

FMRI-based identity classification accuracy in left temporal and frontal regions predicts speaker recognition performance

Virginia Aglieri¹, Bastien Cagna¹, Lionel Velly^{1

2}, Sylvain Takerkart¹, Pascal Belin^{3

4}

Affiliations

¹ Institut de Neurosciences de La Timone, UMR 7289, CNRS and Aix-Marseille Université, 13005, Marseille, France.
² Department of Anesthesiology and Intensive Care, CHU Timone, Assistance Publique Hôpitaux de Marseille, Aix Marseille Université, 13005, Marseille, France.
³ Institut de Neurosciences de La Timone, UMR 7289, CNRS and Aix-Marseille Université, 13005, Marseille, France. pascal.belin@univ-amu.fr.
⁴ Department of Psychology, Université de Montréal, Montreal, QC, H2V 2S9, Canada. pascal.belin@univ-amu.fr.

PMID: 33436825
PMCID: PMC7803954
DOI: 10.1038/s41598-020-79922-7

FMRI-based identity classification accuracy in left temporal and frontal regions predicts speaker recognition performance

Virginia Aglieri et al. Sci Rep. 2021.

. 2021 Jan 12;11(1):489.

doi: 10.1038/s41598-020-79922-7.

Authors

Virginia Aglieri¹, Bastien Cagna¹, Lionel Velly^{1

2}, Sylvain Takerkart¹, Pascal Belin^{3

4}

Affiliations

¹ Institut de Neurosciences de La Timone, UMR 7289, CNRS and Aix-Marseille Université, 13005, Marseille, France.
² Department of Anesthesiology and Intensive Care, CHU Timone, Assistance Publique Hôpitaux de Marseille, Aix Marseille Université, 13005, Marseille, France.
³ Institut de Neurosciences de La Timone, UMR 7289, CNRS and Aix-Marseille Université, 13005, Marseille, France. pascal.belin@univ-amu.fr.
⁴ Department of Psychology, Université de Montréal, Montreal, QC, H2V 2S9, Canada. pascal.belin@univ-amu.fr.

PMID: 33436825
PMCID: PMC7803954
DOI: 10.1038/s41598-020-79922-7

Abstract

Speaker recognition is characterized by considerable inter-individual variability with poorly understood neural bases. This study was aimed at (1) clarifying the cerebral correlates of speaker recognition in humans, in particular the involvement of prefrontal areas, using multi voxel pattern analysis (MVPA) applied to fMRI data from a relatively large group of participants, and (2) at investigating the relationship across participants between fMRI-based classification and the group's variable behavioural performance at the speaker recognition task. A cohort of subjects (N = 40, 28 females) selected to present a wide distribution of voice recognition abilities underwent an fMRI speaker identification task during which they were asked to recognize three previously learned speakers with finger button presses. The results showed that speaker identity could be significantly decoded based on fMRI patterns in voice-sensitive regions including bilateral temporal voice areas (TVAs) along the superior temporal sulcus/gyrus but also in bilateral parietal and left inferior frontal regions. Furthermore, fMRI-based classification accuracy showed a significant correlation with individual behavioural performance in left anterior STG/STS and left inferior frontal gyrus. These results highlight the role of both temporal and extra-temporal regions in performing a speaker identity recognition task with motor responses.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

**Figure 1**
Identification task (fMRI version). In each trial, a French word was delivered (e.g. “Jeudi”) and subjects had to recognize the speaker by pressing one of three keys on a five-keys keyboard (“Response”) within a 5-s window; the following trial (e.g. “Pouvez”) was then delivered after an interstimulus interval (ISI) randomized between 3 and 5 s. One run was made up of 36 trials.

**Figure 2**
Behavioural performance during the fMRI session. a. distribution of the scores obtained in the scanner (N = 40); b. mean performance across the four runs. Error bars represent 95% confidence interval. c. Confusion matrix of presented voices vs. answers. Colorscale indicates proportion of trials corresponding to each Presented voice-Answer pair. The sum of the 9 cells is 100%; random performance would be 11.11% in each cell; perfect performance would be 33.33% in each cell of the diagonal and zero elsewhere.

**Figure 3**
Univariate random effects (RFX) analysis. First row: identification task, contrast voices > baseline; second row: localizer task, contrast sound > baseline (left) and voices > non-voices (right). All contrasts are shown at p < 0.05 FWE-corrected (voxel-level), extent threshold = 0 mm³. The group statistical map is here overlaid on an inflated render in SPM.

**Figure 4**
Maps of significant group-averaged above-chance speaker classificationaccuracy in sound sensitive regions. Upper row: in green, region of interest (ROI) used as explicit mask (voxels showing significant higher activity for sounds as compared to baseline in the voice localizer task at p < 0.001 uncorrected for multiple comparisons). Below: results of the group-level permutation-based random effects analysis showing the regions in which classification accuracy was significantly higher than chance (p-FWE < 0.05 voxel-level-corrected) in the region of interest. Green dashed contours indicate TVAs obtained in the sample (V > NV). The group statistical map is here overlaid on an inflated render in SPM and shown at uncorrected level (p < 0.00002) for illustration purposes. The histograms show the distribution of speaker classification accuracy scores across all voxels, for the clusters with more than 200 voxels. Green vertical lines represent median classification accuracy scores across voxels. Matrices represent averaged confusion matrices yielded by the classifiers at each cerebral location, with same convention as the behavioural confusion matrix of Fig. 2c.

**Figure 5**
Correlation of speaker classification accuracy with individual speaker recognition scores. Co-variation of speaker classification accuracy with identification scores in voice sensitive regions was computed in a permutation-based regression model with one covariate. R² = coefficient of determination of Spearman correlation coefficient. Confusion matrices are sown with the same convention as in Fig. 2c. Green dashed lines indicate TVAs obtained in the sample (V > NV). The group statistical map at an uncorrected threshold (p < 0.001) is here overlaid on an inflated render in SPM for illustration purposes.

See this image and copyright information in PMC

Cited by

Visual Deprivation Alters Functional Connectivity of Neural Networks for Voice Recognition: A Resting-State fMRI Study.
Pang W, Zhou W, Ruan Y, Zhang L, Shu H, Zhang Y, Zhang Y. Pang W, et al. Brain Sci. 2023 Apr 7;13(4):636. doi: 10.3390/brainsci13040636. Brain Sci. 2023. PMID: 37190601 Free PMC article.
Aberrant functional hubs and related networks attributed to cognitive impairment in patients with anti‑N‑methyl‑D‑aspartate receptor encephalitis.
Fan B, Zhou X, Pang L, Long Q, Lv C, Zheng J. Fan B, et al. Biomed Rep. 2024 May 22;21(1):104. doi: 10.3892/br.2024.1792. eCollection 2024 Jul. Biomed Rep. 2024. PMID: 38827495 Free PMC article.
Cortical-striatal brain network distinguishes deepfake from real speaker identity.
Roswandowitz C, Kathiresan T, Pellegrino E, Dellwo V, Frühholz S. Roswandowitz C, et al. Commun Biol. 2024 Jun 11;7(1):711. doi: 10.1038/s42003-024-06372-6. Commun Biol. 2024. PMID: 38862808 Free PMC article.
Perspective-taking is associated with increased discriminability of affective states in the ventromedial prefrontal cortex.
Vaccaro AG, Heydari P, Christov-Moore L, Damasio A, Kaplan JT. Vaccaro AG, et al. Soc Cogn Affect Neurosci. 2022 Dec 1;17(12):1082-1090. doi: 10.1093/scan/nsac035. Soc Cogn Affect Neurosci. 2022. PMID: 35579186 Free PMC article.
Unveiling the development of human voice perception: Neurobiological mechanisms and pathophysiology.
Harford EE, Holt LL, Abel TJ. Harford EE, et al. Curr Res Neurobiol. 2024 Mar 8;6:100127. doi: 10.1016/j.crneur.2024.100127. eCollection 2024. Curr Res Neurobiol. 2024. PMID: 38511174 Free PMC article. Review.

See all "Cited by" articles

References

1. Schweinberger SR, Kawahara H, Simpson AP, Skuk VG, Zäske R. Speaker perception. Wiley Interdiscip. Rev. Cogn. Sci. 2014;5:15–25. doi: 10.1002/wcs.1261. - DOI - PubMed
1. Aglieri, V. et al. The glasgow voice memory test: assessing the ability to memorize and recognize unfamiliar voices. Behav. Res. Methods 1–14 (2016). - PubMed
1. Ogg M, Moraczewski D, Kuchinsky SE, Slevc LR. Separable neural representations of sound sources: speaker identity and musical timbre. NeuroImage. 2019;191:116–126. doi: 10.1016/j.neuroimage.2019.01.075. - DOI - PubMed
1. Tsantani M, Kriegeskorte N, McGettigan C, Garrido L. Faces and voices in the brain: a modality-general person-identity representation in superior temporal sulcus. NeuroImage. 2019;201:116004. doi: 10.1016/j.neuroimage.2019.07.017. - DOI - PubMed
1. Latinus M, Crabbe F, Belin P. Learning-induced changes in the cerebral processing of voice identity. Cereb. Cortex. 2011;21:2820–2828. doi: 10.1093/cercor/bhr077. - DOI - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations
Medical
- MedlinePlus Health Information

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

FMRI-based identity classification accuracy in left temporal and frontal regions predicts speaker recognition performance

Affiliations

FMRI-based identity classification accuracy in left temporal and frontal regions predicts speaker recognition performance

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Other Literature Sources

Medical