Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 May 15;15(1):16872.
doi: 10.1038/s41598-025-92702-5.

fNIRS experimental study on the impact of AI-synthesized familiar voices on brain neural responses

Affiliations

fNIRS experimental study on the impact of AI-synthesized familiar voices on brain neural responses

Weijia Zhang et al. Sci Rep. .

Abstract

With the advancement of artificial intelligence (AI) speech synthesis technology, its application in personalized voice services and its potential role in emotional comfort have become research focal points. This study aims to explore the impact of AI-synthesized familiar and unfamiliar voices on neural responses in the brain. We utilized the GPT-SoVITS project to synthesize three types of voices: a female voice, a sweet female voice, and a maternal voice, all reading the same text. Using functional near-infrared spectroscopy (fNIRS), we monitored the changes in blood oxygen levels in the prefrontal cortex and temporal cortex of participants during listening, assessing brain activation. The experimental results showed that the AI-synthesized maternal voice significantly activated the participants' prefrontal and temporal cortices. Combined with participants' feedback, the activation of these areas may reflect multidimensional features of voice familiarity processing, including emotion, memory, and cognitive function. This finding reveals the potential applications of AI voice technology in enhancing mental health and user experience.

Keywords: Artificial intelligence; Human-computer interaction; Social impact of synthetic speech; Voice synthesis; fNIRS.

PubMed Disclaimer

Conflict of interest statement

Declarations. Competing interests: The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
Schematic diagram of the locations of the Prefrontal Cortex and Temporal Cortex. The Prefrontal Cortex is located at the front of the brain and is involved in cognitive control and emotional regulation, while the Temporal Cortex is situated below the Prefrontal Cortex and is responsible for processing language and emotional memory. The division of these regions is based on the classification of brain lobes. The cerebral cortex is a thin layer of neural tissue that covers the entire surface of the brain, approximately 2–4 mm thick, and contains the brain’s gray matter. The Prefrontal Cortex is part of the frontal lobe, as indicated in the diagram. The Temporal region corresponds to the Temporal Cortex.
Fig. 2
Fig. 2
Overall framework diagram.
Fig. 3
Fig. 3
Flowchart of audio processing and model training.
Fig. 4
Fig. 4
Experimental voice synthesis workflow diagram.
Fig. 5
Fig. 5
fNIRS Probe Array Diagram. The left brain diagram marks the positions corresponding to 22 channels, while the right brain diagram indicates the locations of the 10 emitter probes and 8 receiver probes. The lower diagram is a 2D illustration of the channels. Channels CH3, CH5-CH18, and CH20 mainly cover the prefrontal cortex, while CH1, CH2, CH4, CH19, CH21, and CH22 primarily cover the temporal cortex. Data on the brain regions covered by these channels must be exported from the NirSpark software. See Sect. “Block average and channel average” for details.
Fig. 6
Fig. 6
Experimental Setup. Participants wore head covers equipped with fNIRS probes (red box, left) to measure brain activity. The signals were transmitted via optical fibers to the fNIRS processing unit (center), where they were processed and sent to a connected computer for visualization (red box, right). Audio stimuli were delivered through a speaker (red box, bottom center) positioned in front of the participant.
Fig. 7
Fig. 7
Experimental Task Flowchart. Task 1 involves AI-generated unfamiliar voices, while Task 2 involves AI-generated familiar voices. The unfamiliar voices include the female voice from Experiment 1 and the sweet female voice from Experiment 2.
Fig. 8
Fig. 8
Comparison of the fNIRS data from participants under two different AI-generated voices in Experiment 1 after averaging, along with their differences (*p<0.05; **p<0.01; ***p<0.001; ****p<0.0001).
Fig. 9
Fig. 9
Comparison of the fNIRS data from participants under two different AI-generated voices in Experiment 2 after averaging, along with their differences (*p < 0.05; **p < 0.01; ***p < 0.001; ****p < 0.0001).

Similar articles

References

    1. Wang, Y. et al. Tacotron: Towards End-to-End speech synthesis. (2017). https://ui.adsabs.harvard.edu/abs/2017arXiv170310135W
    1. Ren, Y. et al. FastSpeech: Fast, robust and controllable text to speech. (2019). https://ui.adsabs.harvard.edu/abs/2019arXiv190509263R
    1. Huang, S. F., Lin, C. J., Liu, D. R., Chen, Y. C. & Lee, H. y. Meta-TTS: Meta-learning for few-shot speaker adaptive text-to-speech. IEEE/ACM Trans. Audio Speech Lang. Process.30, 1558–1571. 10.1109/TASLP.2022.3167258 (2022).
    1. Hu, W. & Zhu, X. A real-time voice cloning system with multiple algorithms for speech quality improvement. PLOS ONE18, e0283440. 10.1371/journal.pone.0283440 (2023). - PMC - PubMed
    1. Dan, Q., Xukui, Y. & Honggang, Y. et al, Overview of recent progress in low-resource few-shot continuous speech recognition. J. Zhengzhou Univ. (Engineering Science)44, 1–9. 10.13705/j.issn.1671-6833.2023.04.014 (2023).

LinkOut - more resources