The SPEAK study rationale and design: A linguistic corpus-based approach to understanding thought disorder
- PMID: 36732110
- PMCID: PMC10387495
- DOI: 10.1016/j.schres.2022.12.048
The SPEAK study rationale and design: A linguistic corpus-based approach to understanding thought disorder
Abstract
Aim: Psychotic symptoms are typically measured using clinical ratings, but more objective and sensitive metrics are needed. Hence, we will assess thought disorder using the Research Domain Criteria (RDoC) heuristic for language production, and its recommended paradigm of "linguistic corpus-based analyses of language output". Positive thought disorder (e.g., tangentiality and derailment) can be assessed using word-embedding approaches that assess semantic coherence, whereas negative thought disorder (e.g., concreteness, poverty of speech) can be assessed using part-of-speech (POS) tagging to assess syntactic complexity. We aim to establish convergent validity of automated linguistic metrics with clinical ratings, assess normative demographic variance, determine cognitive and functional correlates, and replicate their predictive power for psychosis transition among at-risk youths.
Methods: This study will assess language production in 450 English-speaking individuals in Australia and Canada, who have recent onset psychosis, are at clinical high risk (CHR) for psychosis, or who are healthy volunteers, all well-characterized for cognition, function and symptoms. Speech will be elicited using open-ended interviews. Audio files will be transcribed and preprocessed for automated natural language processing (NLP) analyses of coherence and complexity. Data analyses include canonical correlation, multivariate linear regression with regularization, and machine-learning classification of group status and psychosis outcome.
Conclusions: This prospective study aims to characterize language disturbance across stages of psychosis using computational approaches, including psychometric properties, normative variance and clinical correlates, important for biomarker development. SPEAK will create a large archive of language data available to other investigators, a rich resource for the field.
Keywords: Latent semantic analysis; Natural language processing; Part-of-speech-tagging; Psychosis; Thought disorder; Ultra/clinical high risk.
Copyright © 2023 Elsevier B.V. All rights reserved.
Conflict of interest statement
Declaration of competing interest The authors declare no real or potential conflict of interest. All authors have reviewed and approved the manuscript before submission.
Figures

Similar articles
-
Prediction of psychosis across protocols and risk cohorts using automated language analysis.World Psychiatry. 2018 Feb;17(1):67-75. doi: 10.1002/wps.20491. World Psychiatry. 2018. PMID: 29352548 Free PMC article.
-
Construct validity for computational linguistic metrics in individuals at clinical risk for psychosis: Associations with clinical ratings.Schizophr Res. 2022 Jul;245:90-96. doi: 10.1016/j.schres.2022.01.019. Epub 2022 Jan 29. Schizophr Res. 2022. PMID: 35094918 Free PMC article.
-
Linguistic correlates of suicidal ideation in youth at clinical high-risk for psychosis.Schizophr Res. 2023 Sep;259:20-27. doi: 10.1016/j.schres.2023.03.014. Epub 2023 Mar 17. Schizophr Res. 2023. PMID: 36933977 Free PMC article.
-
Language as a biomarker for psychosis: A natural language processing approach.Schizophr Res. 2020 Dec;226:158-166. doi: 10.1016/j.schres.2020.04.032. Epub 2020 Jun 1. Schizophr Res. 2020. PMID: 32499162 Free PMC article. Review.
-
Speech markers to predict and prevent recurrent episodes of psychosis: A narrative overview and emerging opportunities.Schizophr Res. 2024 Apr;266:205-215. doi: 10.1016/j.schres.2024.02.036. Epub 2024 Feb 29. Schizophr Res. 2024. PMID: 38428118 Review.
Cited by
-
Enriching Data Science and Health Care Education: Application and Impact of Synthetic Data Sets Through the Health Gym Project.JMIR Med Educ. 2024 Jan 16;10:e51388. doi: 10.2196/51388. JMIR Med Educ. 2024. PMID: 38227356 Free PMC article.
-
Crisis of objectivity: using a personalized network model to understand maladaptive sensemaking in a patient with psychotic, affective, and obsessive-compulsive symptoms.Front Psychol. 2024 Aug 6;15:1383717. doi: 10.3389/fpsyg.2024.1383717. eCollection 2024. Front Psychol. 2024. PMID: 39165762 Free PMC article.
-
Development and temporal validation of a clinical prediction model of transition to psychosis in individuals at ultra-high risk in the UHR 1000+ cohort.World Psychiatry. 2024 Oct;23(3):400-410. doi: 10.1002/wps.21240. World Psychiatry. 2024. PMID: 39279417 Free PMC article.
-
Relationship between grammar and schizophrenia: a systematic review and meta-analysis.Commun Med (Lond). 2025 Jun 16;5(1):235. doi: 10.1038/s43856-025-00944-1. Commun Med (Lond). 2025. PMID: 40523895 Free PMC article.
References
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
Medical