Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Nov 29;8(1):102.
doi: 10.1038/s41537-022-00306-z.

Deconstructing heterogeneity in schizophrenia through language: a semi-automated linguistic analysis and data-driven clustering approach

Affiliations

Deconstructing heterogeneity in schizophrenia through language: a semi-automated linguistic analysis and data-driven clustering approach

Valentina Bambini et al. Schizophrenia (Heidelb). .

Abstract

Previous works highlighted the relevance of automated language analysis for predicting diagnosis in schizophrenia, but a deeper language-based data-driven investigation of the clinical heterogeneity through the illness course has been generally neglected. Here we used a semiautomated multidimensional linguistic analysis innovatively combined with a machine-driven clustering technique to characterize the speech of 67 individuals with schizophrenia. Clusters were then compared for psychopathological, cognitive, and functional characteristics. We identified two subgroups with distinctive linguistic profiles: one with higher fluency, lower lexical variety but greater use of psychological lexicon; the other with reduced fluency, greater lexical variety but reduced psychological lexicon. The former cluster was associated with lower symptoms and better quality of life, pointing to the existence of specific language profiles, which also show clinically meaningful differences. These findings highlight the importance of considering language disturbances in schizophrenia as multifaceted and approaching them in automated and data-driven ways.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Visual representation of the pre-processing of audio files and extraction of the linguistic measures.
Speech samples were obtained via the semi-structured interviews of the APACS test (1), and then transcribed using the CLAN software (2). Afterwards, token-based values were automatically extracted from the transcripts: R Studio was used to automatically obtain lexical frequency values for each token in the text from the Corpus and Frequency Lexicon of Written Italian (CoLFIS) corpus (3a), Natural Language Toolkit (NTLK) was employed to compute the Type-Token ratio (3b), while the Linguistic Inquiry and Word Count (LIWC) software was used to obtain the frequency of affective words and words indicating cognitive mechanisms (i.e., Psychological Lexicon) and Personal Pronouns (3c). Finally, the speech samples were processed using the PRAAT software (4) to determine the number of utterances for the computation of the Mean Length of Utterance, as well as to extract pause and gap duration and the number of pauses, used for the computation of the Pause-to-word ratio.
Fig. 2
Fig. 2. Results of the principal component analysis and cluster analysis.
A Associations between the four principal components (PCs) identified by the Principal Component Analysis and the linguistic features; green-colored boxes indicate a positive association, while red-colored boxes a negative association. B Silhouette width for participants included in both clusters (horizontal axis) and average silhouette width for the two-cluster solution (red dashed line). C Clusters distribution around centroids.
Fig. 3
Fig. 3. Results of the linear discriminant analysis (LDA) with random-split samples.
A Mean values of training and testing accuracy (error bars indicate standard deviations), computed on random samples with 75%, 50%, and 25% of participants of the original sample assigned to the training subset and the remaining to the testing subset (50 iterations performed using the same method). The general performance of the classification function remains high and stable across different training-testing partitions. B Conceptual representation of a replication with 50% of participants randomly assigned to the training subset and the other 50% to the testing subset (training accuracy: 100%; testing accuracy: 97%). The outcome of this single replication shows that in the testing subset only one participant from Cluster 2 is misclassified by the model.
Fig. 4
Fig. 4. Results of cluster comparisons and summary of clusters.
A Between-cluster comparisons for Quality of Life Scale (QLS), including Interpersonal Relations (IRe), Instrumental Role (IRo), and Personal Autonomy (PA) sub-scales and total score (Tot). B Between-cluster comparisons for PANSS scores (Positive, Negative, and General Scales and Disorganization dimension score). C Summary of the linguistic, psychopathological, and functional differences of the participants belonging to Cluster 1 and Cluster 2 (only significant differences are reported): arrows indicate higher (↑) or lower (↓) linguistic performance, psychopathological symptoms (as evaluated by the scores obtained in the PANSS Positive, Negative, and General scales and Disorganization score), and functioning (as evaluated with the QLS subscales and Total score).
Fig. 5
Fig. 5. Results of the correlation analysis with BACS and ToM PST subscores across clusters.
A Correlations between linguistic-based principal components (PCs) and BACS and ToM PST subscores for Cluster 1. B Correlations between linguistic-based PCs and BACS and ToM PST subscores for Cluster 2 (significant correlations are indicated with the asterisk, with significance level p < 0.05).

Similar articles

Cited by

References

    1. Bambini V, et al. The communicative impairment as a core feature of schizophrenia: Frequency of pragmatic deficit, cognitive substrates, and relation with quality of life. Compr. Psychiatry. 2016;71:106–120. - PubMed
    1. Parola A, Berardinelli L, Bosco FM. Cognitive abilities and theory of mind in explaining communicative-pragmatic disorders in patients with schizophrenia. Psychiatry Res. 2018;260:144–151. - PubMed
    1. Covington MA, et al. Schizophrenia and the structure of language: The linguist’s view. Schizophr. Res. 2005;77:85–98. - PubMed
    1. Parola A, Simonsen A, Bliksted V, Fusaroli R. Voice patterns in schizophrenia: A systematic review and Bayesian meta-analysis. Schizophr. Res. 2020;216:24–40. - PubMed
    1. Manschreck TC, Maher BA, Hoover TM, Ames D. The type—token ratio in schizophrenic disorders: clinical and research value. Psychol. Med. 1984;14:151–157. - PubMed