Prediction of psychosis across protocols and risk cohorts using automated language analysis

Cheryl M Corcoran^{1

2}, Facundo Carrillo^{3

4}, Diego Fernández-Slezak^{3

4}, Gillinder Bedi^{2

5

6}, Casimir Klim^{2

5}, Daniel C Javitt^{2

5}, Carrie E Bearden⁷, Guillermo A Cecchi⁸

Affiliations

¹ Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
² New York State Psychiatric Institute, New York, NY, USA.
³ Departamento de Computación, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires, Buenos Aires, Argentina.
⁴ Instituto de Investigación en Ciencias de la Computación, Universidad de Buenos Aires, Buenos Aires, Argentina.
⁵ Department of Psychiatry, Columbia University Medical Center, New York, NY, USA.
⁶ Centre for Youth Mental Health, University of Melbourne, and Orygen National Centre of Excellence in Youth Mental Health, Melbourne, Australia.
⁷ Department of Psychiatry and Biobehavioral Sciences and Psychology, University of California Los Angeles; Semel Institute for Neuroscience and Human Behavior, Los Angeles, CA, USA.
⁸ Computational Biology Center - Neuroscience, IBM T.J. Watson Research Center, Ossining, NY, USA.

PMID: 29352548
PMCID: PMC5775133
DOI: 10.1002/wps.20491

Prediction of psychosis across protocols and risk cohorts using automated language analysis

Cheryl M Corcoran et al. World Psychiatry. 2018 Feb.

. 2018 Feb;17(1):67-75.

doi: 10.1002/wps.20491.

Authors

Cheryl M Corcoran^{1

2}, Facundo Carrillo^{3

4}, Diego Fernández-Slezak^{3

4}, Gillinder Bedi^{2

5

6}, Casimir Klim^{2

5}, Daniel C Javitt^{2

5}, Carrie E Bearden⁷, Guillermo A Cecchi⁸

Affiliations

¹ Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
² New York State Psychiatric Institute, New York, NY, USA.
³ Departamento de Computación, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires, Buenos Aires, Argentina.
⁴ Instituto de Investigación en Ciencias de la Computación, Universidad de Buenos Aires, Buenos Aires, Argentina.
⁵ Department of Psychiatry, Columbia University Medical Center, New York, NY, USA.
⁶ Centre for Youth Mental Health, University of Melbourne, and Orygen National Centre of Excellence in Youth Mental Health, Melbourne, Australia.
⁷ Department of Psychiatry and Biobehavioral Sciences and Psychology, University of California Los Angeles; Semel Institute for Neuroscience and Human Behavior, Los Angeles, CA, USA.
⁸ Computational Biology Center - Neuroscience, IBM T.J. Watson Research Center, Ossining, NY, USA.

PMID: 29352548
PMCID: PMC5775133
DOI: 10.1002/wps.20491

Abstract

Language and speech are the primary source of data for psychiatrists to diagnose and treat mental disorders. In psychosis, the very structure of language can be disturbed, including semantic coherence (e.g., derailment and tangentiality) and syntactic complexity (e.g., concreteness). Subtle disturbances in language are evident in schizophrenia even prior to first psychosis onset, during prodromal stages. Using computer-based natural language processing analyses, we previously showed that, among English-speaking clinical (e.g., ultra) high-risk youths, baseline reduction in semantic coherence (the flow of meaning in speech) and in syntactic complexity could predict subsequent psychosis onset with high accuracy. Herein, we aimed to cross-validate these automated linguistic analytic methods in a second larger risk cohort, also English-speaking, and to discriminate speech in psychosis from normal speech. We identified an automated machine-learning speech classifier - comprising decreased semantic coherence, greater variance in that coherence, and reduced usage of possessive pronouns - that had an 83% accuracy in predicting psychosis onset (intra-protocol), a cross-validated accuracy of 79% of psychosis onset prediction in the original risk cohort (cross-protocol), and a 72% accuracy in discriminating the speech of recent-onset psychosis patients from that of healthy individuals. The classifier was highly correlated with previously identified manual linguistic predictors. Our findings support the utility and validity of automated natural language processing methods to characterize disturbances in semantics and syntax across stages of psychotic disorder. The next steps will be to apply these methods in larger risk cohorts to further test reproducibility, also in languages other than English, and identify sources of variability. This technology has the potential to improve prediction of psychosis outcome among at-risk youths and identify linguistic targets for remediation and preventive intervention. More broadly, automated linguistic analysis can be a powerful tool for diagnosis and treatment across neuropsychiatry.

Keywords: Automated language analysis; high-risk youths; machine learning; prediction of psychosis; semantic coherence; syntactic complexity.

PubMed Disclaimer

Figures

**Figure 1**
The four‐factor University of California Los Angeles (UCLA) machine learning classifier of psychosis outcome. Factors are aggregates of weighted syntactic (a‐e) and semantic coherence (f‐n) features, as listed in Table 2. The first three factors are weighted toward semantic features (maximum, variance and minimum), and the fourth factor is weighted toward a syntactic feature (possessive pronouns). Y axes show factor weights.

**Figure 2**
Receiver operating characteristics (ROC) for the University of California Los Angeles (UCLA) clinical high‐risk (CHR) classifier of psychosis outcome as applied to the UCLA dataset (solid line) and to the realigned New York City (NYC) dataset (dotted line). AUC – area under the curve.

**Figure 3**
Projection of the top three factors for the University of California Los Angeles (UCLA) and New York City (NYC) clinical high‐risk (CHR) cohorts. These factors were weighted for semantic coherence features. A. Convex hull of non‐converters (CHR–) in UCLA, with 11 of 19 converters (CHR+) outside of the hull. B. Convex hull of CHR– in NYC, with 3 of 5 CHR+ outside the hull. C. Data in A and B (all CHR) shown together to demonstrate extent of overlap in language properties.

**Figure 4**
Projection of the top three factors for University of California Los Angeles (UCLA) first‐episode psychosis (FEP) patients and healthy controls (CTR). A. Convex hull of healthy controls (CTR) with 11 of 16 FEP patients outside the hull. B. Overlap of convex hulls for FEP vs. CTR, and converters (CHR+) vs. non‐converters (CHR–).

See this image and copyright information in PMC

References

1. Roche E, Creed L, MacMahon D et al. The epidemiology and associated phenomenology of formal thought disorder: a systematic review. Schizophr Bull 2015;41:951‐62. - PMC - PubMed
1. Andreasen NC, Grove WM. Thought, language, and communication in schizophrenia: diagnosis and prognosis. Schizophr Bull 1986;12:348‐59. - PubMed
1. Gooding DC, Ott SL, Roberts SA et al. Thought disorder in mid‐childhood as a predictor of adulthood diagnostic outcome: findings from the New York High‐Risk Project. Psychol Med 2013;43:1003‐12. - PubMed
1. Nelson B, Yuen HP, Wood SJ et al. Long‐term follow‐up of a group at ultra high risk (“prodromal”) for psychosis: the PACE 400 study. JAMA Psychiatry 2013;70:793‐802. - PubMed
1. Addington J, Liu L, Buchy L et al. North American Prodrome Longitudinal Study (NAPLS 2): the prodromal symptoms. J Nerv Ment Dis 2015;203:328‐35. - PMC - PubMed

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Prediction of psychosis across protocols and risk cohorts using automated language analysis

Affiliations

Prediction of psychosis across protocols and risk cohorts using automated language analysis

Authors

Affiliations

Abstract

Figures

References

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources