Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2025 Oct 22:12:e67802.
doi: 10.2196/67802.

Performance of Automatic Speech Analysis in Detecting Depression: Systematic Review and Meta-Analysis

Affiliations
Review

Performance of Automatic Speech Analysis in Detecting Depression: Systematic Review and Meta-Analysis

Patricia Laura Maran et al. JMIR Ment Health. .

Abstract

Background: Despite the high prevalence and significant burden of depression, underdiagnosis remains a persistent challenge. Automatic speech analysis (ASA) has emerged as a promising method for depression assessment. However, a comprehensive quantitative synthesis evaluating its diagnostic accuracy is still lacking.

Objective: This systematic review and meta-analysis aimed to assess the diagnostic performance of ASA in detecting depression, considering both machine learning and deep learning approaches.

Methods: We conducted a systematic search across 8 databases, including MEDLINE, PsycInfo, Embase, CINAHL, IEEE Xplore, ACM Digital Library, Scopus, and Google Scholar from January 2013 to April 1, 2025. We included studies published in English that evaluated the accuracy of ASA for detecting depression, and reported performance metrics such as accuracy, sensitivity, specificity, precision, or confusion matrices. Study quality was assessed using a modified version of the Quality Assessment of Studies of Diagnostic Accuracy-Revised. A 3-level meta-analysis was performed to estimate the pooled highest and lowest accuracy, sensitivity, specificity, and precision. Meta-regressions and subgroup analyses were performed to explore heterogeneity across various factors, including type of publication, artificial intelligence algorithms, speech features, speech-eliciting tasks, ground truth assessment, validation approach, dataset, dataset language, participants' mean age, and sample size.

Results: Of the 1345 records identified, 105 studies met the inclusion criteria. The pooled mean of the highest accuracy, sensitivity, specificity, and precision were 0.81 (95% CI 0.79 to 0.83), 0.84 (95% CI 0.81 to 0.86), 0.83 (95% CI 0.79 to 0.86), and 0.81 (95% CI 0.77 to 0.84), respectively, whereas the pooled mean of the lowest accuracy, sensitivity, specificity, and precision were 0.66 (95% CI 0.63 to 0.69), 0.63 (95% CI 0.58 to 0.68), 0.60 (95% CI 0.55 to 0.66), and 0.64 (95% CI 0.58 to 0.70), respectively.

Conclusions: ASA shows promise as a method for detecting depression, though its readiness for clinical application as a standalone tool remains limited. At present, it should be regarded as a complementary method, with potential applications across diverse contexts. Further high-quality, peer-reviewed studies are needed to support the development of robust, generalizable models and to advance this emerging field.

Trial registration: PROSPERO CRD42023444431; https://www.crd.york.ac.uk/PROSPERO/view/CRD42023444431.

Keywords: AI; artificial intelligence; automatic speech analysis; depression; meta-analysis; mobile phone.

PubMed Disclaimer

Conflict of interest statement

Conflicts of Interest: JV-S has received travel awards (air tickets + hotel) for taking part in annual psychiatric meetings from Lundbeck and Janssen-Cilag, and was on the speakers’ bureau and acted as a consultant for Janssen Cilag. JAR-Q was on the speakers’ bureau and acted as a consultant for Biogen, Idorsia, Casen-Recordati, Janssen-Cilag, Novartis, Takeda, Bial, Sincrolab, Neuraxpharm, Novartis, Bristol Myers Squibb, Medice, Rubió, Uriach, Technofarma, and Raffo in the last 3 years. He also received travel awards (air tickets + hotel) for taking part in psychiatric meetings from Idorsia, Janssen-Cilag, Rubió, Takeda, Bial, and Medice. The Department of Psychiatry, chaired by him, received unrestricted educational and research support from the following companies in the last 3 years: Exeltis, Idorsia, Janssen-Cilag, Neuraxpharm, Oryzon, Roche, Probitas, and Rubió. AR-U acted as a consultant for Danone, and she has collaborated scientifically with Janssen-Cilag, Pileje, Farmasierra, and Organon. She has also received travel awards (air tickets and hotel) for taking part in annual psychiatric meetings from Lundbeck. All other authors declare no financial or nonfinancial competing interests.

Figures

Figure 1
Figure 1
PRISMA flow diagram of the study selection process. This diagram describes the process of identifying, screening, and selecting studies for inclusion. Initially, a total of 1345 records were identified from databases. After the removal of 473 duplicates, 872 records remained for the screening phase. Of these, 579 records were excluded, and 293 reports were sought for retrieval. Further, 12 reports could not be retrieved, resulting in 281 reports assessed for eligibility. Ultimately, 176 reports were excluded based on the predefined inclusion criteria. The final review included 105 studies. PRISMA: Preferred Reporting Items for Systematic Reviews and Meta-Analyses.
Figure 2
Figure 2
Three-level forest plot of the highest accuracy estimates. This forest plot illustrates the results of this 3-level meta-analysis based on 86 studies, comprising 148 estimates [32-34,36,37,39-49,51,53-68,71-74,77-87,89-97,99-101,104,107,109-126,128,129,131-133,136]. The solid squares represent point estimates of the highest accuracy, with horizontal lines indicating the 95% CIs. The rhombus at the bottom represents the pooled highest accuracy estimates.
Figure 3
Figure 3
Three-level forest plot of the lowest accuracy estimates. This forest plot illustrates the results of this 3-level meta-analysis based on 114 estimates of the lowest accuracy, reported in 65 studies [33, 34, 36, 37, 39-45, 47, 49, 51, 53, 58-61, 63-65, 67, 68, 72-74, 77, 78, 81-84, 86, 87, 90-92, 94, 96, 100, 101, 104, 107, 109, 111-115, 117-121, 123-126, 128, 129, 131-133, 136]. The solid squares represent point estimates of accuracy, with horizontal lines indicating the 95% CIs. The rhombus at the bottom represents the pooled lowest accuracy estimates.
Figure 4
Figure 4
Three-level forest plot of the highest sensitivity estimates. This forest plot illustrates the results of this 3-level meta-analysis based on 135 estimates of the highest sensitivity, reported in 81 studies [34-51, 53-62, 64-71, 73-86, 88, 90, 92, 93, 95-97, 99-101, 103, 104, 108-110, 112, 113, 116-120, 123, 125-129, 131, 132, 135]. The solid squares represent point estimates of sensitivity, with horizontal lines indicating the 95% CIs. The rhombus at the bottom represents the estimated pooled highest sensitivity.
Figure 5
Figure 5
Three-level forest plot of the lowest sensitivity estimates. This forest plot illustrates the results of this 3-level meta-analysis based on 105 estimates of the lowest sensitivity, from 64 studies [34-45, 47, 49-51, 53, 55, 59-61, 64, 65, 67-70, 73-78, 81-84, 86, 88, 90, 92, 96, 97, 100, 101, 103, 104, 108, 109, 112, 113, 117-120, 123, 125-129, 131, 132, 135]. The solid squares represent point estimates of sensitivity, with horizontal lines indicating the 95% CIs. The rhombus at the bottom represents the estimated pooled lowest sensitivity.
Figure 6
Figure 6
Three-level forest plot of the highest specificity estimates. This forest plot illustrates the results of this 3-level meta-analysis based on 77 estimates of the highest specificity, from 47 studies [34, 36, 37, 40, 41, 43-46, 48, 49, 51, 54, 65, 66, 68, 76, 79-82, 84, 86, 88, 92, 93, 95, 97, 99, 100, 103, 104, 109-112, 116-118, 120, 123, 125-128, 131, 132]. The solid squares represent point estimates of specificity, with horizontal lines indicating the 95% CIs. The rhombus at the bottom represents the estimated pooled highest specificity.
Figure 7
Figure 7
Three-level forest plot of the lowest specificity estimates. This forest plot illustrates the results of this 3-level meta-analysis based on 55 estimates of the lowest specificity, from 34 studies [34, 36, 37, 40, 41, 43-45, 49, 51, 65, 68, 76, 81, 82, 84, 86, 88, 92, 100, 103, 109, 111, 112, 117, 118, 120, 123, 125-128, 131, 132]. The solid squares represent point estimates of specificity, with horizontal lines indicating the 95% CIs. The rhombus at the bottom represents the estimated pooled lowest specificity.
Figure 8
Figure 8
Three-level forest plot of the highest precision estimates. This forest plot illustrates the results of this 3-level meta-analysis based on 95 estimates of the highest precision, reported in 62 studies [34, 35, 37-40, 42-44, 46, 47, 49-51, 53-57, 59-62, 64-71, 74, 75, 77-85, 90-93, 95, 96, 99, 101, 104, 108-113, 116, 119, 125, 126, 129]. The solid squares represent point estimates of precision, with horizontal lines indicating the 95% CIs. The rhombus at the bottom represents the estimated pooled highest precision.
Figure 9
Figure 9
Three-level forest plot of the lowest precision estimates. This forest plot illustrates the results of this 3-level meta-analysis based on 73 estimates of the lowest precision, reported in 46 studies [34, 35, 37-40, 42-44, 47, 49, 50, 53, 59-61, 64, 65, 67-70, 74, 75, 77, 78, 81-84, 90-92, 96, 97, 101, 104, 108, 109, 111-113, 119, 125, 126, 129]. The solid squares represent point estimates of precision, with horizontal lines indicating the 95% CIs. The rhombus at the bottom represents the estimated pooled lowest precision.

References

    1. The global burden of disease: 2004 update. World Health Organization. 2008. [2025-09-20]. https://www.who.int/publications/i/item/9789241563710 .
    1. Greenberg P, Birnbaum H. The economic burden of depression in the US: societal and patient perspectives. Expert Opin Pharmacother. 2005;6(3):369–376. doi: 10.1002/9783527619672.ch3. - DOI - PubMed
    1. Evans-Lacko S, Knapp M. Global patterns of workplace productivity for people with depression: absenteeism and presenteeism costs across eight diverse countries. Soc Psychiatry Psychiatr Epidemiol. 2016;51(11):1525–1537. doi: 10.1007/s00127-016-1278-4. https://europepmc.org/abstract/MED/27667656 10.1007/s00127-016-1278-4 - DOI - PMC - PubMed
    1. Saarni SI, Suvisaari J, Sintonen H, Pirkola S, Koskinen S, Aromaa A, Lönnqvist Jouko. Impact of psychiatric disorders on health-related quality of life: general population survey. Br J Psychiatry. 2007;190:326–332. doi: 10.1192/bjp.bp.106.025106.S0007125000171818 - DOI - PubMed
    1. Lépine JP, Briley M. The increasing burden of depression. Neuropsychiatr Dis Treat. 2011;7(Suppl 1):3–7. doi: 10.2147/NDT.S19617. https://europepmc.org/abstract/MED/21750622 ndt-7-003 - DOI - PMC - PubMed

LinkOut - more resources