Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Apr 26;18(4):e0284667.
doi: 10.1371/journal.pone.0284667. eCollection 2023.

Ataxic speech disorders and Parkinson's disease diagnostics via stochastic embedding of empirical mode decomposition

Affiliations

Ataxic speech disorders and Parkinson's disease diagnostics via stochastic embedding of empirical mode decomposition

Marta Campi et al. PLoS One. .

Abstract

Medical diagnostic methods that utilise modalities of patient symptoms such as speech are increasingly being used for initial diagnostic purposes and monitoring disease state progression. Speech disorders are particularly prevalent in neurological degenerative diseases such as Parkinson's disease, the focus of the study undertaken in this work. We will demonstrate state-of-the-art statistical time-series methods that combine elements of statistical time series modelling and signal processing with modern machine learning methods based on Gaussian process models to develop methods to accurately detect a core symptom of speech disorder in individuals who have Parkinson's disease. We will show that the proposed methods out-perform standard best practices of speech diagnostics in detecting ataxic speech disorders, and we will focus the study, particularly on a detailed analysis of a well regarded Parkinson's data speech study publicly available making all our results reproducible. The methodology developed is based on a specialised technique not widely adopted in medical statistics that found great success in other domains such as signal processing, seismology, speech analysis and ecology. In this work, we will present this method from a statistical perspective and generalise it to a stochastic model, which will be used to design a test for speech disorders when applied to speech time series signals. As such, this work is making contributions both of a practical and statistical methodological nature.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Figure describing the taxonomy of SMD according to the Darley, Aronson, and Brown model.
Note that the taxonomy panel was produced by [10] and modified in this paper. Acoustic features representing the vocal tract and capturing formant structure are amongst the most discriminant in ASR tasks. Our interest is to detect the presence or absence of Parkinson’s through such acoustic features. Hence, since one of the early symptoms of Parkinson’s is ataxic speech, which implies several speech abnormalities in the vocal tract, this will be the set of anomalies we aim to discriminate. Furthermore, based on [17], our goal is to construct an ASR-SD system able to deal with complex settings such as non-stationarity of the speech, small sample sizes, unbalanced data, and interpretation of the obtained results concerning gender voices, carrying different formant structure.
Fig 2
Fig 2. Figure showing the ASR systems detecting ataxic speech.
The top panel represents the ASR-SI system implemented by [29], which has been exploited to develop our technique. After having collected the speech data and split it into training and testing sets, the authors extracted (amongst others) Mel Frequency Cepstral Coefficients (MFCCs) and phase-based cepstral coefficients (MGDCCs) and combined them into a unique feature vector to then perform a classification task with a Support Vector Machine (SVM) for the diagnosis of cerebellar ataxia. The bottom panel of the plot shows the steps of our ASR system, which instead is SD and relies on read text as the speech task performed by the participants. The considered data set is given at [38], with people affected by Parkinson’s disease. We constructed the training and testing set and then extracted (amongst others) six different feature vectors, which we have been tested individually through a Generalized Likelihood Ratio Test (GLRT). The classification task targets the detection of ataxic speech with an equivalent statistical framework for diagnosing Parkinson’s disease. Note that an extension of the bottom panel including all the novel features will be presented in Fig 3.
Fig 3
Fig 3. Figure showing the proposed ASR system detecting ataxic speech.
It corresponds to an extension of Fig 2 and presenting all the novel features used, hence, the IMFs and the BLIMFs (output of SM2 and SM3) and, further, MFCCs will be extracted on these and an SVM equivalent the one performed by [29] will be carried. Note that only the first 3 bases are retained. Reasons behind this choice will be later introduced.
Fig 4
Fig 4. Partition Rule Definition showing how the empirical IFs samples {pl,n}l=1,n=1L,N (colored in green) within region Π are partitioned into 12 time-frequency sub-regions that are defined by running the CEM method deriving Π*.
Note that, for this figure, we used only the first three IMFs, hence the first three IFs. This means that L = 3 in the Figure. The three IFs corresponds to the first three IFs of a speech segment used within the application of interest. Therefore, as it will be later in the paper highlighted, we consider speech segments with length N = 5000 samples.
Fig 5
Fig 5. Comparison of the original extracted IMFs (left panels) and the obtained band-limited IMFs.
(right panels). The original signal is a segment of the speech signals considered in section 7. The x-axis represents time and is given in seconds. It corresponds to 0.13 seconds, or, 130 milliseconds approximately (given that the speech segments is 5000 samples recorded at 44.kHz). The y-axis shows the amplitudes of the IMFs (left panels) and the band-limited IMFs (right panels).
Fig 6
Fig 6. Figure presenting the steps required for the implementation of System Model 3.
The first plot represents the original interpolated signal s˜(t). This is a segment of speech signal used within the experiments section and corresponds to 0.13 seconds of speech. The x-axis corresponds to time (measures in seconds) and the y-axis to the amplitude. In the following plots, equivalent settings for the axes apply. Afterwards, the EMD is applied and the first three IMFs γ1(t), γ2(t), γ3(t) are plotted. The related IFs ω1(t), ω2(t), ω3(t) are extracted and plotted. After, the empirical sample points of the IFs are passed to the CEM method. The fourth step of this procedure is the initial partition Π0 used to initialise the cross-entropy algorithm, while the fifth step represents the CEM estimated optimal partition Π*. Lastly, the reconstructed BLIMFs are provided.
Fig 7
Fig 7. The Mel filter bank structure for 40 filters.
Each peak represents the center frequency of the filters.
Fig 8
Fig 8. Barplots for the number of segments of length 5000 samples (approximately 0.113 seconds) for the female patients (left panels) and the male patients (right panels).
The x-axis represents the different stages of the UPDRS II-5 where we also included the healthy patients. The y-axis represents the counts of the segments divided by patient.
Fig 9
Fig 9. Figure showing a diagram for the steps required for the testing procedure of the model estimation phase.
The GLRT test is computed on each mini-batch extracted by the segments of every patient. Note that each mini-batch is approximately 2.2.ms. The GLRT test is conducted on weighted and aggregated Fisher score vectors.
Fig 10
Fig 10. There are two panels for every plot.
The top panels are spectrograms of the original speech segments for four voices. The x-axis is time (0.113 s), given in seconds, the y-axis is frequency given in Hz (0–5000Hz). The second panel represents the results of the GLRT test conducted on every mini-batch of that segment. There are 50 mini-batches per segment. White corresponds to 0 and black to 1. 0 corresponds to equality in distribution, hence no disease detected, while 1 corresponds to the detection of Parkinson’s disease. (a) Healthy female speech segment, (b) Sick female speech segment. UPDRS score equal to 1, (c) Healthy male speech segment and (d) Sick male speech segment.
Fig 11
Fig 11. There are two panels for every plot.
The top panels are spectrograms of the speech segments IMFs (left) and the BLIMFs (right) obtained from the EMD of the male speech segment given in Fig 10(d). The x-axis is time (0.113 s), given in seconds, the y-axis is frequency given in Hz (0–10000Hz). The second panel represents the results of the GLRT test conducted on every mini-batch of that IMFs or BLIMFS segment. There are 50 mini-batches per segment. White corresponds to equality in distribution, hence no disease detected, while black corresponds to the detection of Parkinson’s disease. (a) Speech segments of the first three IMFs extracted from the sick male speech segment given in Fig 10(d) and (b) Speech segments of the first three BLIMFs computed on the IMFs of the the sick male speech segment given in Fig 10(d).
Fig 12
Fig 12. Results of t-SNE for the ARIMA parameters of the first three IMFs (left panel) and the first three BLIMFs (right panel).
Note that, to run the algorithm, a PCA step was applied to reduce the initial data dimensionality, 90% of explained variation was retained. The axes represent the two dimensions identified by the t-SNE algorithm denoted as comp-1 and comp-2. Note that the azure points are denoted as 1 in the legend and refer to the parameters of the sick patients, while, the 0 points to the ones of the healthy patients.
Fig 13
Fig 13. Barplots presenting the number of zero ARIMA parameters fit on the mini-batches for the female case.
The left panel refer to the case of the first three IMFs (used in the system model classification and presented in the sections below) split according to healthy (HC) and sick (PD) patients. The right panel presents an equivalent plot referring to the case of he first three BLIMFs, (used in the system model classification and presented in the sections below) split according to healthy (HC) and sick (PD) patients.

Similar articles

Cited by

References

    1. Daoudi K, Das B, Tykalova T, Klempir J, Rusz J. Speech acoustic indices for differential diagnosis between Parkinson’s disease, multiple system atrophy and progressive supranuclear palsy. npj Parkinson’s Disease. 2022;8(1):142. doi: 10.1038/s41531-022-00389-6 - DOI - PMC - PubMed
    1. Hecker P, Steckhan N, Eyben F, Schuller BW, Arnrich B. Voice Analysis for Neurological Disorder Recognition–A Systematic Review and Perspective on Emerging Trends. Frontiers in Digital Health. 2022;4. doi: 10.3389/fdgth.2022.842301 - DOI - PMC - PubMed
    1. Rana A, Dumka A, Singh R, Panda MK, Priyadarshi N, Twala B. Imperative Role of Machine Learning Algorithm for Detection of Parkinson’s Disease: Review, Challenges and Recommendations. Diagnostics. 2022;12(8):2003. doi: 10.3390/diagnostics12082003 - DOI - PMC - PubMed
    1. Ayaz Z, Naz S, Khan NH, Razzak I, Imran M. Automated methods for diagnosis of Parkinson’s disease and predicting severity level. Neural Computing and Applications. 2022; p. 1–36.
    1. Sakar BE, Isenkul ME, Sakar CO, Sertbas A, Gurgen F, Delil S, et al.. Collection and analysis of a Parkinson speech dataset with multiple types of sound recordings. IEEE Journal of Biomedical and Health Informatics. 2013;17(4):828–834. doi: 10.1109/JBHI.2013.2245674 - DOI - PubMed