Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 May 20:2025:4839334.
doi: 10.1155/da/4839334. eCollection 2025.

Method Matters: Enhancing Voice-Based Depression Detection With a New Data Collection Framework

Affiliations

Method Matters: Enhancing Voice-Based Depression Detection With a New Data Collection Framework

Dan Vilenchik et al. Depress Anxiety. .

Abstract

Depression accounts for a major share of global disability-adjusted life-years (DALYs). Diagnosis typically requires a psychiatrist or lengthy self-assessments, which can be challenging for symptomatic individuals. Developing reliable, noninvasive, and accessible detection methods is a healthcare priority. Voice analysis offers a promising approach for early depression detection, potentially improving treatment access and reducing costs. This paper presents a novel pipeline for depression detection that addresses several critical challenges in the field, including data imbalance, label quality, and model generalizability. Our study utilizes a high-quality, high-depression-prevalence dataset collected from a specialized chronic pain clinic, enabling robust depression detection even with a limited sample size. We obtained a lift in the accuracy of up to 15% over the 50-50 baseline in our 52-patient dataset using a 3-fold cross-validation test (which means the train set is n = 34, std 2.8%, p-value 0.01). We further show that combining voice-only acoustic features with a single self-report question (subject unit of distress [SUDs]) significantly improves predictive accuracy. While relying on SUDs is not always good practice, our data collection setting lacked incentives to misrepresent depression status; SUDs were highly reliable, giving 86% accuracy; adding acoustic features raises it to 92%, exceeding the stand-alone potential of SUDs with a p-value 0.1. Further data collection will enhance accuracy, supporting a rapid, noninvasive depression detection method that overcomes clinical barriers. These findings offer a promising tool for early depression detection across clinical settings.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflicts of interest.

Figures

Figure 1
Figure 1
SUDs depression scale distribution for men and women, the mean is around 5 for both groups.
Figure 2
Figure 2
The feature processing and extraction pipeline, taking a raw voice recording and processing it into a table of acoustic features.
Figure 3
Figure 3
The train-test pipeline using the preprocessed data. Acoustic features used to train an ML classifier (CatBoost); At test time, CatBoost is applied to all snippets of a patient voice recording, and a threshold is applied to the aggregated “votes” to give a binary decision.
Figure 4
Figure 4
The sensitivity–specificity tradeoff for the entire data, using a 4-CV test, as the threshold τ varies from 1 to 4. Best accuracy is obtained at τ = 1. The baseline accuracy is 64.2%; the best accuracy we obtained is 66.6%, a 2% lift. The standard deviation across folds is 2%–3% depending on τ.
Figure 5
Figure 5
(a) Wav2Vec PCA projection of records of nine female patients divided into three groups based on their depression SUDS score. The groups are indicated by these colors in the plot: red—(SUDS < 2), gray—(2 < = SUDS < = 8), blue—(8 < SUDS). As evident, the generic wav2vec doesn't get any depression signal. (b) PCA projection on two dimensions of Wav2Vec embeddings. Embeddings were extracted per 20 ms segments of audio. Records are of two female patients and two male patients. The colors annotated are red and yellow for female patients and blue and green for male patients. As evident, the signal is dominated by gender and also identity. SUDS, subject unit of distress.
Figure 6
Figure 6
Sensitivity–specificity tradeoff for the LH dataset as the threshold (τ) varies from 1 to 4, evaluated using three-fold cross-validation (train set: 34 women, test set: 18). The highest accuracy with the lowest variance is achieved at τ = 4 (65.3% ± 2.8%). The baseline accuracy is 50%, with a standard deviation of 6.8%, calculated as the average success rate of a random guess over three tests. Our model outperforms chance by 15.3 percentage points (2.25 standard deviations, p-value 0.01). The standard deviation across folds for τ = 4 is 2.8%.
Figure 7
Figure 7
Swarm plot using the SHAP library. Top features are presented. (a) SHAP values when acoustic features are used. (b) SHAP values when acoustic features together with SUDs and PHQ15 (Pain) scores are used. SHAP, SHapley Additive exPlanations; SUDS, subject unit of distress.

References

    1. Arias D., Saxena S., Verguet S. Quantifying the Global Burden of Mental Disorders and Their Economic Value. EClinicalMedicine . 2022;54 doi: 10.1016/j.eclinm.2022.101675.101675 - DOI - PMC - PubMed
    1. Mitchell A. J., Vaze A., Rao S. Clinical Diagnosis of Depression in Primary Care: A Meta-Analysis. The Lancet . 2009;374(9690):609–619. - PubMed
    1. Gilbody S. M., Whitty P. M., Grimshaw J. M., Thomas R. E. Improving the Detection and Management of Depression in Primary Care. Quality & Safety in Health Care . 2003;12(2):149–155. doi: 10.1136/qhc.12.2.149. - DOI - PMC - PubMed
    1. Hornik-Lurie T., Cwikel J., Zilber N., Feinson M. C., Biderman A., Lerner Y. Does Specializing in Family Medicine Improve the Detection and Diagnosis of Mental Health Problems? The Israel Journal of Psychiatry and Related Sciences . 2016;53(1):63–70. - PubMed
    1. World Health Organization. WHO and ILO Call for New Measures to Tackle Mental Health Issues at Work. 2022. https://www.who.int/news/item/28-09-2022-who-and-ilo-call-for-new-measur... .

LinkOut - more resources