Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Jan 12;11(1):489.
doi: 10.1038/s41598-020-79922-7.

FMRI-based identity classification accuracy in left temporal and frontal regions predicts speaker recognition performance

Affiliations

FMRI-based identity classification accuracy in left temporal and frontal regions predicts speaker recognition performance

Virginia Aglieri et al. Sci Rep. .

Abstract

Speaker recognition is characterized by considerable inter-individual variability with poorly understood neural bases. This study was aimed at (1) clarifying the cerebral correlates of speaker recognition in humans, in particular the involvement of prefrontal areas, using multi voxel pattern analysis (MVPA) applied to fMRI data from a relatively large group of participants, and (2) at investigating the relationship across participants between fMRI-based classification and the group's variable behavioural performance at the speaker recognition task. A cohort of subjects (N = 40, 28 females) selected to present a wide distribution of voice recognition abilities underwent an fMRI speaker identification task during which they were asked to recognize three previously learned speakers with finger button presses. The results showed that speaker identity could be significantly decoded based on fMRI patterns in voice-sensitive regions including bilateral temporal voice areas (TVAs) along the superior temporal sulcus/gyrus but also in bilateral parietal and left inferior frontal regions. Furthermore, fMRI-based classification accuracy showed a significant correlation with individual behavioural performance in left anterior STG/STS and left inferior frontal gyrus. These results highlight the role of both temporal and extra-temporal regions in performing a speaker identity recognition task with motor responses.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Figure 1
Figure 1
Identification task (fMRI version). In each trial, a French word was delivered (e.g. “Jeudi”) and subjects had to recognize the speaker by pressing one of three keys on a five-keys keyboard (“Response”) within a 5-s window; the following trial (e.g. “Pouvez”) was then delivered after an interstimulus interval (ISI) randomized between 3 and 5 s. One run was made up of 36 trials.
Figure 2
Figure 2
Behavioural performance during the fMRI session. a. distribution of the scores obtained in the scanner (N = 40); b. mean performance across the four runs. Error bars represent 95% confidence interval. c. Confusion matrix of presented voices vs. answers. Colorscale indicates proportion of trials corresponding to each Presented voice-Answer pair. The sum of the 9 cells is 100%; random performance would be 11.11% in each cell; perfect performance would be 33.33% in each cell of the diagonal and zero elsewhere.
Figure 3
Figure 3
Univariate random effects (RFX) analysis. First row: identification task, contrast voices > baseline; second row: localizer task, contrast sound > baseline (left) and voices > non-voices (right). All contrasts are shown at p < 0.05 FWE-corrected (voxel-level), extent threshold = 0 mm3. The group statistical map is here overlaid on an inflated render in SPM.
Figure 4
Figure 4
Maps of significant group-averaged above-chance speaker classificationaccuracy in sound sensitive regions. Upper row: in green, region of interest (ROI) used as explicit mask (voxels showing significant higher activity for sounds as compared to baseline in the voice localizer task at p < 0.001 uncorrected for multiple comparisons). Below: results of the group-level permutation-based random effects analysis showing the regions in which classification accuracy was significantly higher than chance (p-FWE < 0.05 voxel-level-corrected) in the region of interest. Green dashed contours indicate TVAs obtained in the sample (V > NV). The group statistical map is here overlaid on an inflated render in SPM and shown at uncorrected level (p < 0.00002) for illustration purposes. The histograms show the distribution of speaker classification accuracy scores across all voxels, for the clusters with more than 200 voxels. Green vertical lines represent median classification accuracy scores across voxels. Matrices represent averaged confusion matrices yielded by the classifiers at each cerebral location, with same convention as the behavioural confusion matrix of Fig. 2c.
Figure 5
Figure 5
Correlation of speaker classification accuracy with individual speaker recognition scores. Co-variation of speaker classification accuracy with identification scores in voice sensitive regions was computed in a permutation-based regression model with one covariate. R2 = coefficient of determination of Spearman correlation coefficient. Confusion matrices are sown with the same convention as in Fig. 2c. Green dashed lines indicate TVAs obtained in the sample (V > NV). The group statistical map at an uncorrected threshold (p < 0.001) is here overlaid on an inflated render in SPM for illustration purposes.

Similar articles

Cited by

References

    1. Schweinberger SR, Kawahara H, Simpson AP, Skuk VG, Zäske R. Speaker perception. Wiley Interdiscip. Rev. Cogn. Sci. 2014;5:15–25. doi: 10.1002/wcs.1261. - DOI - PubMed
    1. Aglieri, V. et al. The glasgow voice memory test: assessing the ability to memorize and recognize unfamiliar voices. Behav. Res. Methods 1–14 (2016). - PubMed
    1. Ogg M, Moraczewski D, Kuchinsky SE, Slevc LR. Separable neural representations of sound sources: speaker identity and musical timbre. NeuroImage. 2019;191:116–126. doi: 10.1016/j.neuroimage.2019.01.075. - DOI - PubMed
    1. Tsantani M, Kriegeskorte N, McGettigan C, Garrido L. Faces and voices in the brain: a modality-general person-identity representation in superior temporal sulcus. NeuroImage. 2019;201:116004. doi: 10.1016/j.neuroimage.2019.07.017. - DOI - PubMed
    1. Latinus M, Crabbe F, Belin P. Learning-induced changes in the cerebral processing of voice identity. Cereb. Cortex. 2011;21:2820–2828. doi: 10.1093/cercor/bhr077. - DOI - PubMed

Publication types