Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024:46:e20233419.
doi: 10.47626/1516-4446-2023-3419. Epub 2024 Jul 29.

Natural language processing in at-risk mental states: enhancing the assessment of thought disorders and psychotic traits with semantic dynamics and graph theory

Affiliations

Natural language processing in at-risk mental states: enhancing the assessment of thought disorders and psychotic traits with semantic dynamics and graph theory

Felipe Argolo et al. Braz J Psychiatry. 2024.

Abstract

Objective: Verbal communication contains key information for mental health assessment. Researchers have linked psychopathology phenomena to certain counterparts in natural language processing. We characterized subtle impairments in the early stages of psychosis, developing new analysis techniques, which led to a comprehensive map associating features of natural language processing with the full range of clinical presentation.

Methods: We used natural language processing to assess spontaneous and elicited speech by 60 individuals with at-risk mental states and 73 controls who were screened from 4,500 quota-sampled Portuguese speaking residents of São Paulo, Brazil. Psychotic symptoms were independently assessed with the Structured Interview for Psychosis-Risk Syndromes. Speech features (e.g., sentiments and semantic coherence), including novel ones, were correlated with psychotic traits (Spearman's-?) and at-risk mental state status (general linear models and machine-learning ensembles).

Results: Natural language processing features were informative for classification, presenting a balanced accuracy of 86%. Features such as semantic laminarity (as perseveration), semantic recurrence time (as circumstantiality), and average centrality in word repetition graphs carried the most information and were directly correlated with psychotic symptoms. Grammatical tagging (e.g., use of adjectives) was the most relevant standard measure.

Conclusion: Subtle speech impairments can be detected by sensitive methods and can be used in at-risk mental states screening. We have outlined a blueprint for speech-based evaluation, pairing features to standard psychometric items for thought disorder.

Keywords: Psychosis; at-risk mental states; machine learning; natural language processing; screening; semantics.

PubMed Disclaimer

Conflict of interest statement

FA has provided consulting services and developed technology for private companies. NBM works at Mobile Brain, an Education and Health Tech startup, and has been a consultant to Boehringer Ingelheim. JMG works at Mobile Brain and has provided consulting for developing machine learning models for private companies. The other authors report no conflicts of interest.

Figures

Figure 1
Figure 1. Graphs with low (97th quantile, left panel) and high (third quantile, right panel) average centrality15 from our sample. Nodes with small centrality (blue) are located at the extremities. Nodes with large centrality (red) are more likely to act as a bridge in the shortest path between two other nodes (betweenness centrality) and possess more edges (degree centrality).
Figure 2
Figure 2. Distance matrix for a text about colors (left panel) and a text with random words (right panel) using cosine distance and FastText embeddings. Semantic distances range from 0 (very close, dark purple) to 1 (very far, light yellow). Patterns in the right panel resemble random noise, with no marked vertical, diagonal, or in-between structures. In the left panel, three green rectangles highlight vertical segments, showing similarity of themes in sequences of words. They relate the sequence “yellow, red, white” (12th-14th words) to three other states (first: “colors”; ninth: “blue”; 10th: “green”). The red diagonal shows a high order coherence (18th order): “colors are the” and “violet exist in” (x1,19, x2,20, and x3,21). The white segment shows a five-word (x6,15 to x6,21) interval between recurrences (Poisson time).
Figure 3
Figure 3. Top 20 features found through the feature permutation method. The values indicate the sum importance of each feature after 30 different training loops with 100 permutations of each feature. Feat = feature; SCC = strongly connected components.

References

    1. Chomsky N. Logical syntax and semantics: their linguistic relevance. Language. 1955;31:36.
    1. Croft W, Cruse DA. Cognitive linguistics. Cambridge: Cambridge University Press; 2004.
    1. Pylkkänen L. The neural basis of combinatory syntax and semantics. Sci. 2019;366:62–6. - PubMed
    1. Argolo F, Magnavita G, Mota NB, Ziebold C, Mabunda D, Pan PM, et al. Lowering costs for large-scale screening in psychosis: a systematic review and meta-analysis of performance and value of information for speech-based psychiatric evaluation. Braz J Psychiatry. 2020;42:673–86. - PMC - PubMed
    1. DeLisi LE. Speech disorder in schizophrenia: review of the literature and exploration of its relation to the uniquely human capacity for language. Schizophr Bull. 2001;27:481–96. - PubMed

LinkOut - more resources