. 2011 Aug 17:4:299.

doi: 10.1186/1756-0500-4-299.

Data mining methods in the prediction of Dementia: A real-data comparison of the accuracy, sensitivity and specificity of linear discriminant analysis, logistic regression, neural networks, support vector machines, classification trees and random forests

João Maroco¹, Dina Silva, Ana Rodrigues, Manuela Guerreiro, Isabel Santana, Alexandre de Mendonça

Affiliations

Affiliation

¹ Unidade de Investigação em Psicologia e Saúde & Departamento de Estatística, ISPA - Instituto Universitário, Rua Jardim do Tabaco 44, 1149-041 Lisboa, Portugal. jpmaroco@gmail.com.

PMID: 21849043
PMCID: PMC3180705
DOI: 10.1186/1756-0500-4-299

Data mining methods in the prediction of Dementia: A real-data comparison of the accuracy, sensitivity and specificity of linear discriminant analysis, logistic regression, neural networks, support vector machines, classification trees and random forests

João Maroco et al. BMC Res Notes. 2011.

. 2011 Aug 17:4:299.

doi: 10.1186/1756-0500-4-299.

Authors

João Maroco¹, Dina Silva, Ana Rodrigues, Manuela Guerreiro, Isabel Santana, Alexandre de Mendonça

Affiliation

¹ Unidade de Investigação em Psicologia e Saúde & Departamento de Estatística, ISPA - Instituto Universitário, Rua Jardim do Tabaco 44, 1149-041 Lisboa, Portugal. jpmaroco@gmail.com.

PMID: 21849043
PMCID: PMC3180705
DOI: 10.1186/1756-0500-4-299

Abstract

Background: Dementia and cognitive impairment associated with aging are a major medical and social concern. Neuropsychological testing is a key element in the diagnostic procedures of Mild Cognitive Impairment (MCI), but has presently a limited value in the prediction of progression to dementia. We advance the hypothesis that newer statistical classification methods derived from data mining and machine learning methods like Neural Networks, Support Vector Machines and Random Forests can improve accuracy, sensitivity and specificity of predictions obtained from neuropsychological testing. Seven non parametric classifiers derived from data mining methods (Multilayer Perceptrons Neural Networks, Radial Basis Function Neural Networks, Support Vector Machines, CART, CHAID and QUEST Classification Trees and Random Forests) were compared to three traditional classifiers (Linear Discriminant Analysis, Quadratic Discriminant Analysis and Logistic Regression) in terms of overall classification accuracy, specificity, sensitivity, Area under the ROC curve and Press'Q. Model predictors were 10 neuropsychological tests currently used in the diagnosis of dementia. Statistical distributions of classification parameters obtained from a 5-fold cross-validation were compared using the Friedman's nonparametric test.

Results: Press' Q test showed that all classifiers performed better than chance alone (p < 0.05). Support Vector Machines showed the larger overall classification accuracy (Median (Me) = 0.76) an area under the ROC (Me = 0.90). However this method showed high specificity (Me = 1.0) but low sensitivity (Me = 0.3). Random Forest ranked second in overall accuracy (Me = 0.73) with high area under the ROC (Me = 0.73) specificity (Me = 0.73) and sensitivity (Me = 0.64). Linear Discriminant Analysis also showed acceptable overall accuracy (Me = 0.66), with acceptable area under the ROC (Me = 0.72) specificity (Me = 0.66) and sensitivity (Me = 0.64). The remaining classifiers showed overall classification accuracy above a median value of 0.63, but for most sensitivity was around or even lower than a median value of 0.5.

Conclusions: When taking into account sensitivity, specificity and overall classification accuracy Random Forests and Linear Discriminant analysis rank first among all the classifiers tested in prediction of dementia using several neuropsychological tests. These methods may be used to improve accuracy, sensitivity and specificity of Dementia predictions from neuropsychological testing.

PubMed Disclaimer

Figures

**Figure 1**
Pictorial representation of a neural network (multilayer perceptron) with input layer (dendrites), hidden layer (nucleus) and output layer (axon) (see text for a description of the neural networks components).

**Figure 2**
**Schematic representation of the optimum hyperplane (H0) by a Support Vector Machine**. Diagonal lines represent the classification function for objects {-1} and {+1}. Objects inside the circles are the so-called support vectors verifying w'x + b = -1 or w'x + b = + 1 respectively.

**Figure 3**
Scatter biplots for MCI (white circles) and Dementia (black circles) patients in the 11 predictors and its histograms (DSf - Digit Span Forward; DSb - Digit Span Backward; SF - Verbal Semantic Fluency; Ori - Orientation; WR - Word Recall; VPA - Verbal Paired-associate Learning; LM - Logical Memory; Forg - Forgetting Index; Clock-Clock Drawing; MPR - Raven Progressive Matrices; Prov - Interpretation of Proverbs). See text for tests descriptions.

**Figure 4**
Box-plot distributions of classification accuracy (number of correct classifications/total sample size) for the 5 test samples resulting from the 5-fold cross-validation procedure (see text for abbreviations) (X²_Fr(9) = 22.211; p = 0.008). Different letters correspond to methods with statistically significant differences according to Dunn's mean rank post-hoc comparisons (p < 0.05). Circles represent outliers (observations greater than the 3^rdquartile plus 1.5 times the interquartile range or smaller than the 1^stquartile minus 1.5 times the interquartile range; stars represent extreme outliers, that correspond to observations greater than the 3^rdquartile plus 3 times the interquartile range or smaller than the 1^stquartile minus 3 times the interquartile range.

**Figure 5**
Box-plot distributions of specificity (number of MCI predicted/number of MCI observed) for the 5 test samples resulting from the 5-fold cross-validation procedure (see text for abbreviations) (X²_Fr(9)= 37.292; p < 0.001). Different letters indicate statistically significant differences between classifiers on Dunn's mean rank comparison procedure. Circles and stars represent outliers and extreme outliers respectively.

**Figure 6**
**Box-plot distributions of sensitivity (number of Dementia predicted/number of Dementia observed) (see text for abbreviations) (X²_Fr(9)= 29.0; p = 0.001)**. Different letters indicate statistically significant differences between classifiers on a multiple mean rank comparison procedure. Circles and stars represent outliers and extreme outliers respectively.

**Figure 7**
**Box-plot distributions of area under the Receiver Operating Characteristic curve (AUC) (see text for abbreviations) (X²_Fr(9)= 23.745; p = 0.005)**. Different letters indicate statistically significant differences between classifiers on a multiple mean rank comparison procedure. Circles and stars represent outliers and extreme outliers respectively.

**Figure 8**
**Box-plot distributions of Press' Q (see text for abbreviations) (X²Fr(9) = 21.582; p = 0.01)**. Different letters indicate statistically significant differences between classifiers on Dunn's multiple mean rank comparison procedure. Classifiers with Q3.84 classify significantly better than chance alone for a 0.05 significance level. Circles and stars represent outliers and extreme outliers respectively.

See this image and copyright information in PMC

Cited by

External Validation of Models for Predicting Disability in Community-Dwelling Older People in the Netherlands: A Comparative Study.
van der Ploeg T, Schalk R, Gobbens RJJ. van der Ploeg T, et al. Clin Interv Aging. 2023 Nov 14;18:1873-1882. doi: 10.2147/CIA.S428036. eCollection 2023. Clin Interv Aging. 2023. PMID: 38020449 Free PMC article.
Prediction of Prognostic Risk Factors in Patients with Invasive Candidiasis and Cancer: A Single-Centre Retrospective Study.
Li J, Li Y, Gao Y, Niu X, Tang M, Fu C, Wang Z, Liu J, Song B, Chen H, Gao X, Guan X. Li J, et al. Biomed Res Int. 2022 Jun 2;2022:7896218. doi: 10.1155/2022/7896218. eCollection 2022. Biomed Res Int. 2022. Retraction in: Biomed Res Int. 2024 Mar 20;2024:9797025. doi: 10.1155/2024/9797025. PMID: 35692595 Free PMC article. Retracted.
Prediction of acute kidney injury risk after cardiac surgery: using a hybrid machine learning algorithm.
Petrosyan Y, Mesana TG, Sun LY. Petrosyan Y, et al. BMC Med Inform Decis Mak. 2022 May 18;22(1):137. doi: 10.1186/s12911-022-01859-w. BMC Med Inform Decis Mak. 2022. PMID: 35585624 Free PMC article.
Molecular mechanisms underlying striatal synaptic plasticity: relevance to chronic alcohol consumption and seeking.
Blackwell KT, Salinas AG, Tewatia P, English B, Hellgren Kotaleski J, Lovinger DM. Blackwell KT, et al. Eur J Neurosci. 2019 Mar;49(6):768-783. doi: 10.1111/ejn.13919. Epub 2018 Apr 20. Eur J Neurosci. 2019. PMID: 29602186 Free PMC article.
Clinical detection of deletion structural variants in whole-genome sequences.
Noll AC, Miller NA, Smith LD, Yoo B, Fiedler S, Cooley LD, Willig LK, Petrikin JE, Cakici J, Lesko J, Newton A, Detherage K, Thiffault I, Saunders CJ, Farrow EG, Kingsmore SF. Noll AC, et al. NPJ Genom Med. 2016 Aug 3;1:16026. doi: 10.1038/npjgenmed.2016.26. eCollection 2016. NPJ Genom Med. 2016. PMID: 29263817 Free PMC article.

See all "Cited by" articles

References

1. Ferri CPM, Brayne C. Global prevalence of dementia: a Delphi consensus study. Lancet Neurology. 2005;366:2112–2117. - PMC - PubMed
1. Petersen RC, Stevens JC, Ganguli M, Tangalos EG, Cummings JL, DeKosky ST. Practice parameter: Early detection of dementia: Mild cognitive impairment (an evidence-based review) - Report of the Quality Standards Subcommittee of the American Academy of Neurology. Neurology. 2001;56:1133–1142. - PubMed
1. Portet F, Ousset PJ, Visser PJ, Frisoni GB, Nobili F, Scheltens P, Vellas B, Touchon J. Mild cognitive impairment (MCI) in medical practice: a critical review of the concept and new diagnostic procedure. Report of the MCI Working Group of the European Consortium on Alzheimer's Disease. J Neurol Neurosurg Psychiatry. 2006;77:714–718. doi: 10.1136/jnnp.2005.085332. - DOI - PMC - PubMed
1. de Mendonca A, Guerreiro M, Ribeiro F, Mendes T, Garcia C. Mild cognitive impairment - Focus on diagnosis. Journal of Molecular Neuroscience. 2004;23:143–147. doi: 10.1385/JMN:23:1-2:143. - DOI - PubMed
1. Dubois B, Feldman HH, Jacova C, Dekosky ST, Barberger-Gateau P, Cummings J, Delocourte A, Galasko D, Gauthier S, Jicha G. et al.Research criteria for the diagnosis of Alzheimer"s disease: revising the NINCDS-ADRDA criteria. Lancet Neurology. 2007;6:734–746. doi: 10.1016/S1474-4422(07)70178-3. - DOI - PubMed

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Data mining methods in the prediction of Dementia: A real-data comparison of the accuracy, sensitivity and specificity of linear discriminant analysis, logistic regression, neural networks, support vector machines, classification trees and random forests

Affiliation

Data mining methods in the prediction of Dementia: A real-data comparison of the accuracy, sensitivity and specificity of linear discriminant analysis, logistic regression, neural networks, support vector machines, classification trees and random forests

Authors

Affiliation

Abstract

Figures

Similar articles

Cited by

References

LinkOut - more resources

Full Text Sources

Other Literature Sources

Abstract

Figures

Similar articles

Cited by

References

Related information

LinkOut - more resources

Full Text Sources

Other Literature Sources