Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Aug 17:4:299.
doi: 10.1186/1756-0500-4-299.

Data mining methods in the prediction of Dementia: A real-data comparison of the accuracy, sensitivity and specificity of linear discriminant analysis, logistic regression, neural networks, support vector machines, classification trees and random forests

Affiliations

Data mining methods in the prediction of Dementia: A real-data comparison of the accuracy, sensitivity and specificity of linear discriminant analysis, logistic regression, neural networks, support vector machines, classification trees and random forests

João Maroco et al. BMC Res Notes. .

Abstract

Background: Dementia and cognitive impairment associated with aging are a major medical and social concern. Neuropsychological testing is a key element in the diagnostic procedures of Mild Cognitive Impairment (MCI), but has presently a limited value in the prediction of progression to dementia. We advance the hypothesis that newer statistical classification methods derived from data mining and machine learning methods like Neural Networks, Support Vector Machines and Random Forests can improve accuracy, sensitivity and specificity of predictions obtained from neuropsychological testing. Seven non parametric classifiers derived from data mining methods (Multilayer Perceptrons Neural Networks, Radial Basis Function Neural Networks, Support Vector Machines, CART, CHAID and QUEST Classification Trees and Random Forests) were compared to three traditional classifiers (Linear Discriminant Analysis, Quadratic Discriminant Analysis and Logistic Regression) in terms of overall classification accuracy, specificity, sensitivity, Area under the ROC curve and Press'Q. Model predictors were 10 neuropsychological tests currently used in the diagnosis of dementia. Statistical distributions of classification parameters obtained from a 5-fold cross-validation were compared using the Friedman's nonparametric test.

Results: Press' Q test showed that all classifiers performed better than chance alone (p < 0.05). Support Vector Machines showed the larger overall classification accuracy (Median (Me) = 0.76) an area under the ROC (Me = 0.90). However this method showed high specificity (Me = 1.0) but low sensitivity (Me = 0.3). Random Forest ranked second in overall accuracy (Me = 0.73) with high area under the ROC (Me = 0.73) specificity (Me = 0.73) and sensitivity (Me = 0.64). Linear Discriminant Analysis also showed acceptable overall accuracy (Me = 0.66), with acceptable area under the ROC (Me = 0.72) specificity (Me = 0.66) and sensitivity (Me = 0.64). The remaining classifiers showed overall classification accuracy above a median value of 0.63, but for most sensitivity was around or even lower than a median value of 0.5.

Conclusions: When taking into account sensitivity, specificity and overall classification accuracy Random Forests and Linear Discriminant analysis rank first among all the classifiers tested in prediction of dementia using several neuropsychological tests. These methods may be used to improve accuracy, sensitivity and specificity of Dementia predictions from neuropsychological testing.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Pictorial representation of a neural network (multilayer perceptron) with input layer (dendrites), hidden layer (nucleus) and output layer (axon) (see text for a description of the neural networks components).
Figure 2
Figure 2
Schematic representation of the optimum hyperplane (H0) by a Support Vector Machine. Diagonal lines represent the classification function for objects {-1} and {+1}. Objects inside the circles are the so-called support vectors verifying w'x + b = -1 or w'x + b = + 1 respectively.
Figure 3
Figure 3
Scatter biplots for MCI (white circles) and Dementia (black circles) patients in the 11 predictors and its histograms (DSf - Digit Span Forward; DSb - Digit Span Backward; SF - Verbal Semantic Fluency; Ori - Orientation; WR - Word Recall; VPA - Verbal Paired-associate Learning; LM - Logical Memory; Forg - Forgetting Index; Clock-Clock Drawing; MPR - Raven Progressive Matrices; Prov - Interpretation of Proverbs). See text for tests descriptions.
Figure 4
Figure 4
Box-plot distributions of classification accuracy (number of correct classifications/total sample size) for the 5 test samples resulting from the 5-fold cross-validation procedure (see text for abbreviations) (X2Fr(9) = 22.211; p = 0.008). Different letters correspond to methods with statistically significant differences according to Dunn's mean rank post-hoc comparisons (p < 0.05). Circles represent outliers (observations greater than the 3rd quartile plus 1.5 times the interquartile range or smaller than the 1st quartile minus 1.5 times the interquartile range; stars represent extreme outliers, that correspond to observations greater than the 3rd quartile plus 3 times the interquartile range or smaller than the 1st quartile minus 3 times the interquartile range.
Figure 5
Figure 5
Box-plot distributions of specificity (number of MCI predicted/number of MCI observed) for the 5 test samples resulting from the 5-fold cross-validation procedure (see text for abbreviations) (X2Fr(9)= 37.292; p < 0.001). Different letters indicate statistically significant differences between classifiers on Dunn's mean rank comparison procedure. Circles and stars represent outliers and extreme outliers respectively.
Figure 6
Figure 6
Box-plot distributions of sensitivity (number of Dementia predicted/number of Dementia observed) (see text for abbreviations) (X2Fr(9)= 29.0; p = 0.001). Different letters indicate statistically significant differences between classifiers on a multiple mean rank comparison procedure. Circles and stars represent outliers and extreme outliers respectively.
Figure 7
Figure 7
Box-plot distributions of area under the Receiver Operating Characteristic curve (AUC) (see text for abbreviations) (X2Fr(9)= 23.745; p = 0.005). Different letters indicate statistically significant differences between classifiers on a multiple mean rank comparison procedure. Circles and stars represent outliers and extreme outliers respectively.
Figure 8
Figure 8
Box-plot distributions of Press' Q (see text for abbreviations) (X2Fr(9) = 21.582; p = 0.01). Different letters indicate statistically significant differences between classifiers on Dunn's multiple mean rank comparison procedure. Classifiers with Q3.84 classify significantly better than chance alone for a 0.05 significance level. Circles and stars represent outliers and extreme outliers respectively.

Similar articles

Cited by

References

    1. Ferri CPM, Brayne C. Global prevalence of dementia: a Delphi consensus study. Lancet Neurology. 2005;366:2112–2117. - PMC - PubMed
    1. Petersen RC, Stevens JC, Ganguli M, Tangalos EG, Cummings JL, DeKosky ST. Practice parameter: Early detection of dementia: Mild cognitive impairment (an evidence-based review) - Report of the Quality Standards Subcommittee of the American Academy of Neurology. Neurology. 2001;56:1133–1142. - PubMed
    1. Portet F, Ousset PJ, Visser PJ, Frisoni GB, Nobili F, Scheltens P, Vellas B, Touchon J. Mild cognitive impairment (MCI) in medical practice: a critical review of the concept and new diagnostic procedure. Report of the MCI Working Group of the European Consortium on Alzheimer's Disease. J Neurol Neurosurg Psychiatry. 2006;77:714–718. doi: 10.1136/jnnp.2005.085332. - DOI - PMC - PubMed
    1. de Mendonca A, Guerreiro M, Ribeiro F, Mendes T, Garcia C. Mild cognitive impairment - Focus on diagnosis. Journal of Molecular Neuroscience. 2004;23:143–147. doi: 10.1385/JMN:23:1-2:143. - DOI - PubMed
    1. Dubois B, Feldman HH, Jacova C, Dekosky ST, Barberger-Gateau P, Cummings J, Delocourte A, Galasko D, Gauthier S, Jicha G. et al.Research criteria for the diagnosis of Alzheimer"s disease: revising the NINCDS-ADRDA criteria. Lancet Neurology. 2007;6:734–746. doi: 10.1016/S1474-4422(07)70178-3. - DOI - PubMed

LinkOut - more resources