Alignment of auditory artificial networks with massive individual fMRI brain data leads to generalisable improvements in brain encoding and downstream tasks
- PMID: 40800971
- PMCID: PMC12319826
- DOI: 10.1162/imag_a_00525
Alignment of auditory artificial networks with massive individual fMRI brain data leads to generalisable improvements in brain encoding and downstream tasks
Abstract
Artificial neural networks trained in the field of artificial intelligence (AI) have emerged as key tools to model brain processes, sparking the idea of aligning network representations with brain dynamics to enhance performance on AI tasks. While this concept has gained support in the visual domain, we investigate here the feasibility of creating auditory artificial neural models directly aligned with individual brain activity. This objective raises major computational challenges, as models have to be trained directly with brain data, which is typically collected at a much smaller scale than data used to train AI models. We aimed to answer two key questions: (1) Can brain alignment of auditory models lead to improved brain encoding for novel, previously unseen stimuli? (2) Can brain alignment lead to generalisable representations of auditory signals that are useful for solving a variety of complex auditory tasks? To answer these questions, we relied on two massive datasets: a deep phenotyping dataset from the Courtois neuronal modelling project, where six subjects watched four seasons (36 h) of theFriendsTV series in functional magnetic resonance imaging and the HEAR benchmark, a large battery of downstream auditory tasks. We fine-tuned SoundNet, a small pretrained convolutional neural network with ~2.5 M parameters. Aligning SoundNet with brain data from three seasons ofFriendsled to substantial improvement in brain encoding in the fourth season, extending beyond auditory and visual cortices. We also observed consistent performance gains on the HEAR benchmark, particularly for tasks with limited training data, where brain-aligned models performed comparably with the best-performing models regardless of size. We finally compared individual and group models, finding that individual models often matched or outperformed group models in both brain encoding and downstream task performance, highlighting the data efficiency of fine-tuning with individual brain data. Our results demonstrate the feasibility of aligning artificial neural network representations with individual brain activity during auditory processing, and suggest that this alignment is particularly beneficial for tasks with limited training data. Future research is needed to establish whether larger models can achieve even better performance and whether the observed gains extend to other tasks, particularly in the context of few-shot learning.
Keywords: artificial neural networks; auditory neuroscience; deep phenotyping datasets; downstream generalisation; functional magnetic resonance imaging (fMRI); individual-specific computational models.
© 2025 The Authors. Published under a Creative Commons Attribution 4.0 International (CC BY 4.0) license.
Conflict of interest statement
The authors declare no competing interests.
Figures











Similar articles
-
Short-Term Memory Impairment.2024 Jun 8. In: StatPearls [Internet]. Treasure Island (FL): StatPearls Publishing; 2025 Jan–. 2024 Jun 8. In: StatPearls [Internet]. Treasure Island (FL): StatPearls Publishing; 2025 Jan–. PMID: 31424720 Free Books & Documents.
-
Prescription of Controlled Substances: Benefits and Risks.2025 Jul 6. In: StatPearls [Internet]. Treasure Island (FL): StatPearls Publishing; 2025 Jan–. 2025 Jul 6. In: StatPearls [Internet]. Treasure Island (FL): StatPearls Publishing; 2025 Jan–. PMID: 30726003 Free Books & Documents.
-
Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.Clin Orthop Relat Res. 2024 Dec 1;482(12):2193-2208. doi: 10.1097/CORR.0000000000003185. Epub 2024 Jul 23. Clin Orthop Relat Res. 2024. PMID: 39051924
-
Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3. Cochrane Database Syst Rev. 2022. PMID: 35593186 Free PMC article.
-
Behavioral interventions to reduce risk for sexual transmission of HIV among men who have sex with men.Cochrane Database Syst Rev. 2008 Jul 16;(3):CD001230. doi: 10.1002/14651858.CD001230.pub2. Cochrane Database Syst Rev. 2008. PMID: 18646068
Cited by
-
Low-Rank Tensor Encoding Models Decompose Natural Speech Comprehension Processes.bioRxiv [Preprint]. 2025 Jun 3:2025.06.02.657514. doi: 10.1101/2025.06.02.657514. bioRxiv. 2025. PMID: 40501791 Free PMC article. Preprint.
References
-
- Allen , E. J. , St-Yves , G. , Wu , Y. , Breedlove , J. L. , Prince , J. S. , Dowdle , L. T. , Nau , M. , Caron , B. , Pestilli , F. , Charest , I. , Hutchinson , J. B. , Naselaris , T. , & Kay , K. ( 2022. ). A massive 7T fMRI dataset to bridge cognitive neuroscience and artificial intelligence . Nature Neuroscience , 25 ( 1 ), 116 – 126 . 10.1038/s41593-021-00962-x - DOI - PubMed
-
- Arandjelovic , R. , & Zisserman , A. ( 2017. ). Look, listen and learn . In 2017 IEEE International Conference on Computer Vision (ICCV) , Venice, Italy: (pp. 609–617). IEEE; . 10.1109/ICCV.2017.73 - DOI
-
- Aytar , Y. , Vondrick , C. , & Torralba , A. ( 2016. ). Soundnet: Learning sound representations from unlabeled video . Advances in Neural Information Processing Systems , 29 , 892 – 900 . 10.48550/arXiv.1610.09001 - DOI
-
- Baevski , A. , Zhou , Y. , Mohamed , A. , & Auli , M. ( 2020. ). wav2vec 2.0: A framework for self-supervised learning of speech representations . Advances in Neural Information Processing Systems , 33 , 12449 – 12460 . 10.48550/arXiv.2006.11477 - DOI
LinkOut - more resources
Full Text Sources