Performance of Automatic Speech Analysis in Detecting Depression: Systematic Review and Meta-Analysis
- PMID: 41124683
- PMCID: PMC12590051
- DOI: 10.2196/67802
Performance of Automatic Speech Analysis in Detecting Depression: Systematic Review and Meta-Analysis
Abstract
Background: Despite the high prevalence and significant burden of depression, underdiagnosis remains a persistent challenge. Automatic speech analysis (ASA) has emerged as a promising method for depression assessment. However, a comprehensive quantitative synthesis evaluating its diagnostic accuracy is still lacking.
Objective: This systematic review and meta-analysis aimed to assess the diagnostic performance of ASA in detecting depression, considering both machine learning and deep learning approaches.
Methods: We conducted a systematic search across 8 databases, including MEDLINE, PsycInfo, Embase, CINAHL, IEEE Xplore, ACM Digital Library, Scopus, and Google Scholar from January 2013 to April 1, 2025. We included studies published in English that evaluated the accuracy of ASA for detecting depression, and reported performance metrics such as accuracy, sensitivity, specificity, precision, or confusion matrices. Study quality was assessed using a modified version of the Quality Assessment of Studies of Diagnostic Accuracy-Revised. A 3-level meta-analysis was performed to estimate the pooled highest and lowest accuracy, sensitivity, specificity, and precision. Meta-regressions and subgroup analyses were performed to explore heterogeneity across various factors, including type of publication, artificial intelligence algorithms, speech features, speech-eliciting tasks, ground truth assessment, validation approach, dataset, dataset language, participants' mean age, and sample size.
Results: Of the 1345 records identified, 105 studies met the inclusion criteria. The pooled mean of the highest accuracy, sensitivity, specificity, and precision were 0.81 (95% CI 0.79 to 0.83), 0.84 (95% CI 0.81 to 0.86), 0.83 (95% CI 0.79 to 0.86), and 0.81 (95% CI 0.77 to 0.84), respectively, whereas the pooled mean of the lowest accuracy, sensitivity, specificity, and precision were 0.66 (95% CI 0.63 to 0.69), 0.63 (95% CI 0.58 to 0.68), 0.60 (95% CI 0.55 to 0.66), and 0.64 (95% CI 0.58 to 0.70), respectively.
Conclusions: ASA shows promise as a method for detecting depression, though its readiness for clinical application as a standalone tool remains limited. At present, it should be regarded as a complementary method, with potential applications across diverse contexts. Further high-quality, peer-reviewed studies are needed to support the development of robust, generalizable models and to advance this emerging field.
Trial registration: PROSPERO CRD42023444431; https://www.crd.york.ac.uk/PROSPERO/view/CRD42023444431.
Keywords: AI; artificial intelligence; automatic speech analysis; depression; meta-analysis; mobile phone.
©Patricia Laura Maran, María Dolores Braquehais, Alexandra Vlaic, María Teresa Alonzo-Castillo, Júlia Vendrell-Serres, Josep Antoni Ramos-Quiroga, Amanda Rodríguez-Urrutia. Originally published in JMIR Mental Health (https://mental.jmir.org), 22.10.2025.
Conflict of interest statement
Conflicts of Interest: JV-S has received travel awards (air tickets + hotel) for taking part in annual psychiatric meetings from Lundbeck and Janssen-Cilag, and was on the speakers’ bureau and acted as a consultant for Janssen Cilag. JAR-Q was on the speakers’ bureau and acted as a consultant for Biogen, Idorsia, Casen-Recordati, Janssen-Cilag, Novartis, Takeda, Bial, Sincrolab, Neuraxpharm, Novartis, Bristol Myers Squibb, Medice, Rubió, Uriach, Technofarma, and Raffo in the last 3 years. He also received travel awards (air tickets + hotel) for taking part in psychiatric meetings from Idorsia, Janssen-Cilag, Rubió, Takeda, Bial, and Medice. The Department of Psychiatry, chaired by him, received unrestricted educational and research support from the following companies in the last 3 years: Exeltis, Idorsia, Janssen-Cilag, Neuraxpharm, Oryzon, Roche, Probitas, and Rubió. AR-U acted as a consultant for Danone, and she has collaborated scientifically with Janssen-Cilag, Pileje, Farmasierra, and Organon. She has also received travel awards (air tickets and hotel) for taking part in annual psychiatric meetings from Lundbeck. All other authors declare no financial or nonfinancial competing interests.
Figures
References
-
- The global burden of disease: 2004 update. World Health Organization. 2008. [2025-09-20]. https://www.who.int/publications/i/item/9789241563710 .
-
- Evans-Lacko S, Knapp M. Global patterns of workplace productivity for people with depression: absenteeism and presenteeism costs across eight diverse countries. Soc Psychiatry Psychiatr Epidemiol. 2016;51(11):1525–1537. doi: 10.1007/s00127-016-1278-4. https://europepmc.org/abstract/MED/27667656 10.1007/s00127-016-1278-4 - DOI - PMC - PubMed
-
- Lépine JP, Briley M. The increasing burden of depression. Neuropsychiatr Dis Treat. 2011;7(Suppl 1):3–7. doi: 10.2147/NDT.S19617. https://europepmc.org/abstract/MED/21750622 ndt-7-003 - DOI - PMC - PubMed
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Medical
Miscellaneous
