Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Jun 25:16:17562864231180719.
doi: 10.1177/17562864231180719. eCollection 2023.

Lexical and syntactic deficits analyzed via automated natural language processing: the new monitoring tool in multiple sclerosis

Affiliations

Lexical and syntactic deficits analyzed via automated natural language processing: the new monitoring tool in multiple sclerosis

Martin Šubert et al. Ther Adv Neurol Disord. .

Abstract

Background: Impairment of higher language functions associated with natural spontaneous speech in multiple sclerosis (MS) remains underexplored.

Objectives: We presented a fully automated method for discriminating MS patients from healthy controls based on lexical and syntactic linguistic features.

Methods: We enrolled 120 MS individuals with Expanded Disability Status Scale ranging from 1 to 6.5 and 120 age-, sex-, and education-matched healthy controls. Linguistic analysis was performed with fully automated methods based on automatic speech recognition and natural language processing techniques using eight lexical and syntactic features acquired from the spontaneous discourse. Fully automated annotations were compared with human annotations.

Results: Compared with healthy controls, lexical impairment in MS consisted of an increase in content words (p = 0.037), a decrease in function words (p = 0.007), and overuse of verbs at the expense of noun (p = 0.047), while syntactic impairment manifested as shorter utterance length (p = 0.002), and low number of coordinate clause (p < 0.001). A fully automated language analysis approach enabled discrimination between MS and controls with an area under the curve of 0.70. A significant relationship was detected between shorter utterance length and lower symbol digit modalities test score (r = 0.25, p = 0.008). Strong associations between a majority of automatically and manually computed features were observed (r > 0.88, p < 0.001).

Conclusion: Automated discourse analysis has the potential to provide an easy-to-implement and low-cost language-based biomarker of cognitive decline in MS for future clinical trials.

Keywords: automated linguistic analysis; language; multiple sclerosis; nature language processing; spontaneous discourse.

PubMed Disclaimer

Conflict of interest statement

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Figures

Figure 1.
Figure 1.
Results of linguistic analysis for lexical and syntactic features. The left panel shows data for lexical features, and the right panel data for syntactic features. Horizontal lines represent the means, boxes represent the 95% confidence interval, and whiskers represent the standard deviation. Statistically significant differences between groups after analysis of covariance: *p < 0.05, **p < 0.01, ***p < 0.001. All results are adjusted for the content of discourses. MS, multiple sclerosis.
Figure 2.
Figure 2.
Selected pairs of linguistic features contributing to the best classification accuracy with classification boundaries separating MS from controls. MS, multiple sclerosis.
Figure 3.
Figure 3.
Significant correlation between SDMT and mean length of utterance. The blue circles demonstrate the real, uncorrected linguistic and neuropsychological values, while the correlation coefficient r and its corresponding p-value are corrected to age, sex, education, and content of discourse. SDMT, symbol digit modalities test.
Figure 4.
Figure 4.
Relationship between features extracted from automated and manual data set. NRMSE, normalized root mean square error; Pearson r, Pearson correlation.

References

    1. Eshaghi A, Young AL, Wijeratne PA, et al.. Identifying multiple sclerosis subtypes using unsupervised machine learning and MRI data. Nat Commun 2021; 12: 2078. - PMC - PubMed
    1. Bergsland N, Horakova D, Dwyer MG, et al.. Gray matter atrophy patterns in multiple sclerosis: a 10-year source-based morphometry study. Neuroimage Clin 2018; 17: 444–451. - PMC - PubMed
    1. Lassmann H. Pathogenic mechanisms associated with different clinical courses of multiple sclerosis. Front Immunol 2019; 9: 3116, https://www.frontiersin.org/articles/10.3389/fimmu.2018.03116 - DOI - PMC - PubMed
    1. Benedict RHB, Zivadinov R. Risk factors for and management of cognitive dysfunction in multiple sclerosis. Nat Rev Neurol 2011; 7: 332–342. - PubMed
    1. Cruccu G, Deuschl G, Federico A. Scientific publications of European neurologists: a survey commissioned by the European Academy of Neurology. Eur J Neurol 2018; 25: 1128–1133. - PubMed