Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Sep:2022:3338-3342.
doi: 10.21437/interspeech.2022-10798.

A Step Towards Preserving Speakers' Identity While Detecting Depression Via Speaker Disentanglement

Affiliations

A Step Towards Preserving Speakers' Identity While Detecting Depression Via Speaker Disentanglement

Vijay Ravi et al. Interspeech. 2022 Sep.

Abstract

Preserving a patient's identity is a challenge for automatic, speech-based diagnosis of mental health disorders. In this paper, we address this issue by proposing adversarial disentanglement of depression characteristics and speaker identity. The model used for depression classification is trained in a speaker-identity-invariant manner by minimizing depression prediction loss and maximizing speaker prediction loss during training. The effectiveness of the proposed method is demonstrated on two datasets - DAIC-WOZ (English) and CONVERGE (Mandarin), with three feature sets (Mel-spectrograms, raw-audio signals, and the last-hidden-state of Wav2vec2.0), using a modified DepAudioNet model. With adversarial training, depression classification improves for every feature when compared to the baseline. Wav2vec2.0 features with adversarial learning resulted in the best performance (F1-score of 69.2% for DAIC-WOZ and 91.5% for CONVERGE). Analysis of the class-separability measure (J-ratio) of the hidden states of the DepAudioNet model shows that when adversarial learning is applied, the backend model loses some speaker-discriminability while it improves depression-discriminability. These results indicate that there are some components of speaker identity that may not be useful for depression detection and minimizing their effects provides a more accurate diagnosis of the underlying disorder and can safeguard a speaker's identity.

Keywords: adversarial learning; depression detection; paralinguistics; privacy in healthcare; speaker disentanglement.

PubMed Disclaimer

Figures

Figure 1:
Figure 1:
Block diagram representing adversarial disentanglement of speaker and depression characteristics.

Similar articles

Cited by

References

    1. Nilsonne A, “Speech characteristics as indicators of depressive illness,” Acta Psychiatrica Scandinavica, vol. 77, no. 3, pp. 253–263, 1988. - PubMed
    1. Andreasen NJ et al., “Linguistic analysis of speech in affective disorders,” Archives of General Psychiatry, vol. 33, no. 11, pp. 1361–1367, 1976. - PubMed
    1. Cummins N et al., “A review of depression and suicide risk assessment using speech analysis,” Speech Communication, vol. 71, pp. 10–49, 2015.
    1. France DJ et al., “Acoustical properties of speech as indicators of depression and suicidal risk,” IEEE transactions on Biomedical Engineering, vol. 47, no. 7, pp. 829–837, 2000. - PubMed
    1. Alghowinem S et al., “Detecting depression: a comparison between spontaneous and read speech,” in ICASSP. IEEE, 2013, pp. 7547–7551.

LinkOut - more resources