CA-NeuroSpex: Context-Informed Autoregressive Neuro-Guided Speaker Extraction
- PMID: 41336837
- DOI: 10.1109/EMBC58623.2025.11251577
CA-NeuroSpex: Context-Informed Autoregressive Neuro-Guided Speaker Extraction
Abstract
Neuro-guided target speaker extraction (TSE) leverages neural responses to guide the extraction of attended speech from competing sources, mirroring the brain's ability to navigate multi-speaker environments. However, traditional neuro-guided methods overlook the importance of temporal context. To bridge this gap, we introduce CA-NeuroSpex, a novel context-informed end-to-end TSE framework. It harnesses autoregressive feedback to integrate previously extracted speech as a secondary reference cue via a specialized speech-context encoder. By dynamically fusing this contextual cue with the neural cue, CA-NeuroSpex bolsters extraction performance in a causal decoder setup. Our key contributions include a speech-context encoder for overlapping speech integration, a teacher-forced autoregressive training paradigm, and a gating mechanism for cue fusion. Our results demonstrate the effectiveness of combining dynamic contextual and neural information for robust speaker extraction.
MeSH terms
LinkOut - more resources
Miscellaneous