A Step Towards Preserving Speakers' Identity While Detecting Depression Via Speaker Disentanglement

doi:10.21437/interspeech.2022-10798

. 2022 Sep:2022:3338-3342.

doi: 10.21437/interspeech.2022-10798.

A Step Towards Preserving Speakers' Identity While Detecting Depression Via Speaker Disentanglement

Vijay Ravi¹, Jinhan Wang¹, Jonathan Flint², Abeer Alwan¹

Affiliations

¹ Dept. of Electrical and Computer Engineering, University of California, Los Angeles, USA.
² Dept. of Psychiatry and Biobehavioral Sciences, University of California, Los Angeles, USA.

PMID: 36341467
PMCID: PMC9635494
DOI: 10.21437/interspeech.2022-10798

A Step Towards Preserving Speakers' Identity While Detecting Depression Via Speaker Disentanglement

Vijay Ravi et al. Interspeech. 2022 Sep.

. 2022 Sep:2022:3338-3342.

doi: 10.21437/interspeech.2022-10798.

Authors

Vijay Ravi¹, Jinhan Wang¹, Jonathan Flint², Abeer Alwan¹

Affiliations

¹ Dept. of Electrical and Computer Engineering, University of California, Los Angeles, USA.
² Dept. of Psychiatry and Biobehavioral Sciences, University of California, Los Angeles, USA.

PMID: 36341467
PMCID: PMC9635494
DOI: 10.21437/interspeech.2022-10798

Abstract

Preserving a patient's identity is a challenge for automatic, speech-based diagnosis of mental health disorders. In this paper, we address this issue by proposing adversarial disentanglement of depression characteristics and speaker identity. The model used for depression classification is trained in a speaker-identity-invariant manner by minimizing depression prediction loss and maximizing speaker prediction loss during training. The effectiveness of the proposed method is demonstrated on two datasets - DAIC-WOZ (English) and CONVERGE (Mandarin), with three feature sets (Mel-spectrograms, raw-audio signals, and the last-hidden-state of Wav2vec2.0), using a modified DepAudioNet model. With adversarial training, depression classification improves for every feature when compared to the baseline. Wav2vec2.0 features with adversarial learning resulted in the best performance (F1-score of 69.2% for DAIC-WOZ and 91.5% for CONVERGE). Analysis of the class-separability measure (J-ratio) of the hidden states of the DepAudioNet model shows that when adversarial learning is applied, the backend model loses some speaker-discriminability while it improves depression-discriminability. These results indicate that there are some components of speaker identity that may not be useful for depression detection and minimizing their effects provides a more accurate diagnosis of the underlying disorder and can safeguard a speaker's identity.

Keywords: adversarial learning; depression detection; paralinguistics; privacy in healthcare; speaker disentanglement.

PubMed Disclaimer

Figures

**Figure 1:**
Block diagram representing adversarial disentanglement of speaker and depression characteristics.

See this image and copyright information in PMC

Cited by

Speechformer-CTC: Sequential Modeling of Depression Detection with Speech Temporal Classification.
Wang J, Ravi V, Flint J, Alwan A. Wang J, et al. Speech Commun. 2024 Sep;163:103106. doi: 10.1016/j.specom.2024.103106. Epub 2024 Jul 18. Speech Commun. 2024. PMID: 39364289 Free PMC article.
Diagnostic accuracy of deep learning using speech samples in depression: a systematic review and meta-analysis.
Liu L, Liu L, Wafa HA, Tydeman F, Xie W, Wang Y. Liu L, et al. J Am Med Inform Assoc. 2024 Oct 1;31(10):2394-2404. doi: 10.1093/jamia/ocae189. J Am Med Inform Assoc. 2024. PMID: 39013193 Free PMC article.
Enhancing accuracy and privacy in speech-based depression detection through speaker disentanglement.
Ravi V, Wang J, Flint J, Alwan A. Ravi V, et al. Comput Speech Lang. 2024 Jun;86:101605. doi: 10.1016/j.csl.2023.101605. Epub 2023 Dec 26. Comput Speech Lang. 2024. PMID: 38313320 Free PMC article.
A Privacy-Preserving Unsupervised Speaker Disentanglement Method for Depression Detection from Speech.
Ravi V, Wang J, Flint J, Alwan A. Ravi V, et al. CEUR Workshop Proc. 2024 Feb;3649:57-63. CEUR Workshop Proc. 2024. PMID: 38650610 Free PMC article.
Non-uniform Speaker Disentanglement For Depression Detection From Raw Speech Signals.
Wang J, Ravi V, Alwan A. Wang J, et al. Interspeech. 2023 Aug;2023:2343-2347. doi: 10.21437/interspeech.2023-2101. Interspeech. 2023. PMID: 38045821 Free PMC article.

See all "Cited by" articles

References

1. Nilsonne A, “Speech characteristics as indicators of depressive illness,” Acta Psychiatrica Scandinavica, vol. 77, no. 3, pp. 253–263, 1988. - PubMed
1. Andreasen NJ et al., “Linguistic analysis of speech in affective disorders,” Archives of General Psychiatry, vol. 33, no. 11, pp. 1361–1367, 1976. - PubMed
1. Cummins N et al., “A review of depression and suicide risk assessment using speech analysis,” Speech Communication, vol. 71, pp. 10–49, 2015.
1. France DJ et al., “Acoustical properties of speech as indicators of depression and suicidal risk,” IEEE transactions on Biomedical Engineering, vol. 47, no. 7, pp. 829–837, 2000. - PubMed
1. Alghowinem S et al., “Detecting depression: a comparison between spontaneous and read speech,” in ICASSP. IEEE, 2013, pp. 7547–7551.

Grants and funding

R01 MH122569/MH/NIMH NIH HHS/United States

LinkOut - more resources

Full Text Sources
- Europe PubMed Central
- PubMed Central

[1] Nilsonne A, “Speech characteristics as indicators of depressive illness,” Acta Psychiatrica Scandinavica, vol. 77, no. 3, pp. 253–263, 1988. - PubMed

[2] Nilsonne A, “Speech characteristics as indicators of depressive illness,” Acta Psychiatrica Scandinavica, vol. 77, no. 3, pp. 253–263, 1988. - PubMed

[3] Andreasen NJ et al., “Linguistic analysis of speech in affective disorders,” Archives of General Psychiatry, vol. 33, no. 11, pp. 1361–1367, 1976. - PubMed

[4] Andreasen NJ et al., “Linguistic analysis of speech in affective disorders,” Archives of General Psychiatry, vol. 33, no. 11, pp. 1361–1367, 1976. - PubMed

[5] Cummins N et al., “A review of depression and suicide risk assessment using speech analysis,” Speech Communication, vol. 71, pp. 10–49, 2015.

[6] Cummins N et al., “A review of depression and suicide risk assessment using speech analysis,” Speech Communication, vol. 71, pp. 10–49, 2015.

[7] France DJ et al., “Acoustical properties of speech as indicators of depression and suicidal risk,” IEEE transactions on Biomedical Engineering, vol. 47, no. 7, pp. 829–837, 2000. - PubMed

[8] France DJ et al., “Acoustical properties of speech as indicators of depression and suicidal risk,” IEEE transactions on Biomedical Engineering, vol. 47, no. 7, pp. 829–837, 2000. - PubMed

[9] Alghowinem S et al., “Detecting depression: a comparison between spontaneous and read speech,” in ICASSP. IEEE, 2013, pp. 7547–7551.

[10] Alghowinem S et al., “Detecting depression: a comparison between spontaneous and read speech,” in ICASSP. IEEE, 2013, pp. 7547–7551.

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

A Step Towards Preserving Speakers' Identity While Detecting Depression Via Speaker Disentanglement

Affiliations

A Step Towards Preserving Speakers' Identity While Detecting Depression Via Speaker Disentanglement

Authors

Affiliations

Abstract

Figures

Similar articles

Cited by

References

Grants and funding

LinkOut - more resources

Full Text Sources

Abstract

Figures

Similar articles

Cited by

References

Related information

Grants and funding

LinkOut - more resources

Full Text Sources