A Step Towards Preserving Speakers' Identity While Detecting Depression Via Speaker Disentanglement
- PMID: 36341467
- PMCID: PMC9635494
- DOI: 10.21437/interspeech.2022-10798
A Step Towards Preserving Speakers' Identity While Detecting Depression Via Speaker Disentanglement
Abstract
Preserving a patient's identity is a challenge for automatic, speech-based diagnosis of mental health disorders. In this paper, we address this issue by proposing adversarial disentanglement of depression characteristics and speaker identity. The model used for depression classification is trained in a speaker-identity-invariant manner by minimizing depression prediction loss and maximizing speaker prediction loss during training. The effectiveness of the proposed method is demonstrated on two datasets - DAIC-WOZ (English) and CONVERGE (Mandarin), with three feature sets (Mel-spectrograms, raw-audio signals, and the last-hidden-state of Wav2vec2.0), using a modified DepAudioNet model. With adversarial training, depression classification improves for every feature when compared to the baseline. Wav2vec2.0 features with adversarial learning resulted in the best performance (F1-score of 69.2% for DAIC-WOZ and 91.5% for CONVERGE). Analysis of the class-separability measure (J-ratio) of the hidden states of the DepAudioNet model shows that when adversarial learning is applied, the backend model loses some speaker-discriminability while it improves depression-discriminability. These results indicate that there are some components of speaker identity that may not be useful for depression detection and minimizing their effects provides a more accurate diagnosis of the underlying disorder and can safeguard a speaker's identity.
Keywords: adversarial learning; depression detection; paralinguistics; privacy in healthcare; speaker disentanglement.
Figures
Similar articles
-
Enhancing accuracy and privacy in speech-based depression detection through speaker disentanglement.Comput Speech Lang. 2024 Jun;86:101605. doi: 10.1016/j.csl.2023.101605. Epub 2023 Dec 26. Comput Speech Lang. 2024. PMID: 38313320 Free PMC article.
-
A Privacy-Preserving Unsupervised Speaker Disentanglement Method for Depression Detection from Speech.CEUR Workshop Proc. 2024 Feb;3649:57-63. CEUR Workshop Proc. 2024. PMID: 38650610 Free PMC article.
-
Non-uniform Speaker Disentanglement For Depression Detection From Raw Speech Signals.Interspeech. 2023 Aug;2023:2343-2347. doi: 10.21437/interspeech.2023-2101. Interspeech. 2023. PMID: 38045821 Free PMC article.
-
Manifestation of depression in speech overlaps with characteristics used to represent and recognize speaker identity.Sci Rep. 2023 Jul 10;13(1):11155. doi: 10.1038/s41598-023-35184-7. Sci Rep. 2023. PMID: 37429935 Free PMC article.
-
Scoping Review on the Multimodal Classification of Depression and Experimental Study on Existing Multimodal Models.Diagnostics (Basel). 2022 Nov 3;12(11):2683. doi: 10.3390/diagnostics12112683. Diagnostics (Basel). 2022. PMID: 36359525 Free PMC article.
Cited by
-
Speechformer-CTC: Sequential Modeling of Depression Detection with Speech Temporal Classification.Speech Commun. 2024 Sep;163:103106. doi: 10.1016/j.specom.2024.103106. Epub 2024 Jul 18. Speech Commun. 2024. PMID: 39364289 Free PMC article.
-
Diagnostic accuracy of deep learning using speech samples in depression: a systematic review and meta-analysis.J Am Med Inform Assoc. 2024 Oct 1;31(10):2394-2404. doi: 10.1093/jamia/ocae189. J Am Med Inform Assoc. 2024. PMID: 39013193 Free PMC article.
-
Enhancing accuracy and privacy in speech-based depression detection through speaker disentanglement.Comput Speech Lang. 2024 Jun;86:101605. doi: 10.1016/j.csl.2023.101605. Epub 2023 Dec 26. Comput Speech Lang. 2024. PMID: 38313320 Free PMC article.
-
A Privacy-Preserving Unsupervised Speaker Disentanglement Method for Depression Detection from Speech.CEUR Workshop Proc. 2024 Feb;3649:57-63. CEUR Workshop Proc. 2024. PMID: 38650610 Free PMC article.
-
Non-uniform Speaker Disentanglement For Depression Detection From Raw Speech Signals.Interspeech. 2023 Aug;2023:2343-2347. doi: 10.21437/interspeech.2023-2101. Interspeech. 2023. PMID: 38045821 Free PMC article.
References
-
- Nilsonne A, “Speech characteristics as indicators of depressive illness,” Acta Psychiatrica Scandinavica, vol. 77, no. 3, pp. 253–263, 1988. - PubMed
-
- Andreasen NJ et al., “Linguistic analysis of speech in affective disorders,” Archives of General Psychiatry, vol. 33, no. 11, pp. 1361–1367, 1976. - PubMed
-
- Cummins N et al., “A review of depression and suicide risk assessment using speech analysis,” Speech Communication, vol. 71, pp. 10–49, 2015.
-
- France DJ et al., “Acoustical properties of speech as indicators of depression and suicidal risk,” IEEE transactions on Biomedical Engineering, vol. 47, no. 7, pp. 829–837, 2000. - PubMed
-
- Alghowinem S et al., “Detecting depression: a comparison between spontaneous and read speech,” in ICASSP. IEEE, 2013, pp. 7547–7551.
Grants and funding
LinkOut - more resources
Full Text Sources